jbloomlab / SARS-CoV-2-RBD_DMS

Deep mutational scanning of the receptor-binding domain of SARS-CoV-2 Spike
BSD 3-Clause "New" or "Revised" License
43 stars 17 forks source link

set up to analyze multiple PacBio amplicons #5

Closed jbloom closed 4 years ago

jbloom commented 4 years ago

The ./data/PacBio_amplicons.gb file now contains all the different potential amplicons with appropriate names.

The process_ccs.ipynb reads in this full set of amplicons as potential targets.

The ./data/README.md has been updated to better describe this and the other input data.

Note: the full pipeline is not yet set up to handle multiple amplicons, so will break somewhere midway through process_ccs.ipynb.

tylernstarr commented 4 years ago

In the notebook process_ccs.ipynb, the schemes of the amplicon constructs appears to have some bugs -- perhaps this is past where you did the troubleshooting?

jbloom commented 4 years ago

@tylernstarr, thanks for catching these. The redundant README lines are now removed in e42d90d.

As far as the site mis-labeling in the images, this is actually a bug in the dna_features_viewer. They are not actually wrong, the tick labels are just rendered wrong. I've submitted a pull request to dna_features_viewer (see here) to fix that, so once they merge that request we can fix the numbering in the images.

I'm pretty sure the lengths are correct? It's just that the labeling is not very clear. The labels are above the images, so GD-Pangolin is actually the second one and HKU3-1 is actually the third one: and GD-Pangolin is longer than HKU3-1 as expected when you notice this. I agree the titles are not ideally located and the title is missing for the last one, but I think they are all correct just badly formatted.

If OK with you, I'd suggest with merge this even with the problematic image formatting, and then when the numbering is fixed by my dna_features_viewer pull request, I can work on re-formatting the titles too. But it should not matter for actual analyses.