Closed EricRLucas closed 9 months ago
Hey @EricRLucas, the last thing we are missing (I think) is the file KdrMarkerSnps.csv
. Best place to store this is in the resources/
folder.
Also needs to be designated as an input to the kdr-origins rule, that way if its not there, snakemake will throw an error before the pipeline is run.
Hey @EricRLucas .
Probably no need to add an option in the config for the kdr_marker_snps.csv path, as its not something we would ever want to change - the file will stay in resources/ and should always be there.
The CI runs are now failing, as they run using config files under AmpSeeker/.test/config/. We have to update these two configs every time we make changes to the main config file.
For now, I would just hard code the path resources/Kdr_marker_snps.csv
into the input of the kdr-origins rule, and pass that to papermill in the shell block.
@sanjaynagi Cool., There seem to be several configs in .test. Shall I modify config_agvampir.yaml?
I've fixed it @EricRLucas . No need to edit the .test configs further as I've removed that option from the main config
Very confusing that its failing atm. Cant find the file in resources/ag-vampir/ but defo there and working locally.
Ahh, i forgot that .test/ folder needs its own resources/
@sanjaynagi Not sure why it's throwing this error. It works fine with the vcf that you gave me to test the notebook on. What vcf does it use in the testing? I can can have a look at what happens when I run locally with that vcf
@EricRLucas its all good - I realised that the current test data uses a mini 'reference' genome which is 2L:2,000,000-3,000,000. As a result all the coordinates are off, and so we dont find any intersecting variants in the notebook when running through CI.
Im going to resolve this, probably by re-doing how we do the reference for the test data. Ill probably just add a wget command before the CI runs to download the whole AgamP4 FASTA file, so when we align to it, we have proper coordinates. Ill probably do that in another PR, so I'll merge this soon anyway. Thank you once again!
@sanjaynagi Cool, though sounds like you'll need more than just the correct coordinates, because your current reference genome doesn't actually include the kdr region, so none of the SNPs will have genotype calls.
@EricRLucas kdr is within 2-3Mb of 2L? but in any case, this way of downloading the whole reference and mapping to that, and genotyping all target snps will be much better.
@sanjaynagi Ah yes, sorry, I read it as 20-30Mb.
Added kdr origins notebook.
Partially addresses #38.