sanjaynagi / AmpSeeker

A snakemake workflow for amplicon sequencing
https://sanjaynagi.github.io/AmpSeeker/
0 stars 3 forks source link

Coverage notebook #34

Closed sanjaynagi closed 1 year ago

sanjaynagi commented 1 year ago

In this PR, we add a coverage notebook, which takes the mosdepth output for each sample, and plots total coverage across the whole genome, as well as coverage at SNP targets.

Will resolve #30.

I've had to slightly modify the Bed file we use to add a couple of columns of info. Its difficult, as usually bed files have no column headers, so Im not sure what the best approach is here.

This also means that the CI isn't going to work. The CI is currently based on the amplicon only reference, and I think we should change this, and instead subset to say the first quarter of chromosomal arm 2L, and add some read data only from that region.

Theres now a snakemake command --generate-unit-tests, which i may try and use for this.

sanjaynagi commented 1 year ago

also resolves #33

sanjaynagi commented 1 year ago

Ok @ChabbyTMD , now that we are no longer going down the amplicon only reference route, I have changed the test case for the CI.

Now we use a reference which is the beginning of chromosomal arm 2L:2000000-3000000, where we have a lot of amplicon targets in our data. It seems to work. Only now I've done it, I cant figure out if it was fully necessary, but nevermind. I think it was.

sanjaynagi commented 1 year ago

I've also done your suggestion and made the jupyter book rule create a symlink in the root directory called AmpSeeker-results.html which opens the results pages! :)

ChabbyTMD commented 1 year ago

This is really brilliant Sanjay. Yes going forward it's definitely better to have test data that is more akin to what the workflow will be used for and the subsected chromosome 2L will do very nicely as a test case. I'll definitely have to look up the --generate-unit-test Snakemake feature.

The bed file issue will definitely need some brainstorming. I looked this up and it seems that bed files already have designated info that goes into the other fields. So maybe we could put that additional info in some kind of pseudo-metadata file that we can call with an awk script or something idk.

sanjaynagi commented 1 year ago

The bed file issue will definitely need some brainstorming. I looked this up and it seems that bed files already have designated info that goes into the other fields. So maybe we could put that additional info in some kind of pseudo-metadata file that we can call with an awk script or something idk.

Good point. I agree, I think then we have one input panel metadata file which can be a .tsv, and consists of chrom, start, end plus whatever extra columns we want. Then for the variant calling, we do some shell command to remove the column headers before passing that file (which is now basically a .bed) to samtools mpileup. I have already renamed the config input to targets rather than bed which is good.

Merging now.