malariagen / vector-data

Documentation, discussion forum and issue tracker for malaria vector genomic data released by MalariaGEN.
https://malariagen.github.io/vector-data/
7 stars 4 forks source link

Update references to use multi-region bucket #86

Closed jonbrenas closed 1 month ago

jonbrenas commented 1 month ago

Addresses #85.

The link to the vobs-funestus project also used the old projects path which I deemed could be sorted out at the same time.

review-notebook-app[bot] commented 1 month ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

jonbrenas commented 1 month ago

@ahernank : Am I supposed to run the notebooks after they are modified? I am thinking particularly about [af1/|ag3/]download.ipynb that show examples of how to download data. They seem to be only half-run in the current version of the VUG (and the part that is not run would raise an error).

ahernank commented 1 month ago

Thanks @jonbrenas -- yes, it would be great if you could re-run them. Usually what we try to do, is that we run them (as our check to ensure all code works as expected), and then do a second run clearing all outputs and keeping only the cells we want to show (i.e. clearing anything that it's too messy). That's why they appear half run. When you run them, it is only to make sure the path exists and it is correct in the example, no need to wait for the full download to finish, as some of those examples have huge downloads!

jonbrenas commented 1 month ago

Thank you @ahernank . That's what I expected but, for instance, in af1/download.ipynb, one cell does:

!wget --no-clobber https://1229-vo-gh-dadzie-vmf00095.cog.sanger.ac.uk/VBS24195.vcf.gz
!wget --no-clobber https://1229-vo-gh-dadzie-vmf00095.cog.sanger.ac.uk/VBS24195.vcf.gz.tbi

and the next does:

!bcftools merge --output-type z --regions 3RL:1-1000000 --output merged.vcf.gz VBS24195.vcf.gz VBS24196.vcf.gz 

which is going to raise an error because VBS24196.vcf.gz was not downloaded. Should I modify the code so that all cells can be run without raising an error?

jonbrenas commented 1 month ago

A quick grep seemed to indicate that cloud.ipynb and download.ipynb were the only files that referenced the actual bucket.

alimanfoo commented 1 month ago

Thank you @ahernank . That's what I expected but, for instance, in af1/download.ipynb, one cell does:

!wget --no-clobber https://1229-vo-gh-dadzie-vmf00095.cog.sanger.ac.uk/VBS24195.vcf.gz
!wget --no-clobber https://1229-vo-gh-dadzie-vmf00095.cog.sanger.ac.uk/VBS24195.vcf.gz.tbi

and the next does:

!bcftools merge --output-type z --regions 3RL:1-1000000 --output merged.vcf.gz VBS24195.vcf.gz VBS24196.vcf.gz 

which is going to raise an error because VBS24196.vcf.gz was not downloaded. Should I modify the code so that all cells can be run without raising an error?

Hi @jonbrenas, no need to rerun the bcftools merge command, I think we can just assume that's correct.