Closed susheelbhanu closed 1 year ago
Hi @susheelbhanu ,
I was aware that this was an issue (according to a former developer) but I couldn't find an example with which to replicate the error. Would you be willing to send me the BAM file or .csv.gz file that metascope_id
is accessing so that I can try to find a workaround? Happy to send you a Google Shared Drive link so storage space isn't a roadblock.
Thanks, Aubrey
Thanks @aubreyodom for willing to tackle this. I've uploaded two of the bam files to zenodo here: https://doi.org/10.5281/zenodo.8327966
Caution: each file is at least 6 GB., so let me know if you can't access the files.
P.S. I used the same raw reads to test the full refseq
database and the SILVA138
indices that you had published in the PathoScope2.0 paper. I had no such issues with metascope_id
using refseq, but it's the SILVA138 that is causing this ID length issue. Not sure if this information is relevant, but a FYI just in case.
Thank you!
@susheelbhanu The problem makes more sense now that you mention using the SILVA138 files. The SILVA analysis in our paper was ran a few years ago on PathoScope by another researcher, so I haven't tried to run it on MetaScope. metascope_id
is currently formatted to grab NCBI accession numbers for RefSeq sequence names, so it's not going to work out of the box with SILVA. I'll look into it but it may not be a quick fix for that reason.
Thanks @aubreyodom. I suppose that makes sense. In which case, and maybe related to the other issue which is open - how does one use *metascope_id* with a different database
?
Alternatively, is there a custom function to pull the names from the "different" database? I suppose these are all feature requests for now. Either way, thank you for looking into this.
This might help too, @aubreyodom : https://github.com/pirovc/multitax
Ok, sorry for the delay! Since this is a SILVA issue I'm going to close this issue and leave the other one open since it is more relevant. Here's the update I posted in the other issue @susheelbhanu:
Just an update for folks wanting to use another reference database - we are actively working on this issue and should have an update for metascope_id in the coming months (if not sooner). I'm particularly interested in trying out Greengenes2 and Silva myself. Stay tuned.
Hey @aubreyodom,
I'm running the
metascope_id
step on some samples and running in the below issue, w.r.t. the length of the requested IDs.Do you have a workaround for this, especially when one is working with a complex microbiome? Sorry if I missed something in the documentation.
Thanks!