Open sjanssen2 opened 6 years ago
We could store the references in the FTP server.
Hi @sjanssen2,
Would it be possible in the near future to also create and make available in QIIME2 a pre-compiled SILVA v132 database? I note your comment here that making the database ready for use in q2-fragment-insertion takes around 2 weeks, which is my main reason for not attempting the steps outlined here by @smirarab.
It's great that a pre-compiled SILVA v128 database comes packaged with this plugin in QIIME! I've simply already done some analysis with SILVA v132 and am on a tight schedule, so don't have the time to re-analyse with 128 - at the moment this unfortunately prevents me from using the fragment insertion method to build trees.
Cheers, Rachael
Hey there @rachaellappan --- we would love to get some help with this task - are you interested? If you don't have the bandwidth, maybe you could cross-post this request to the QIIME 2 Forum, that way more eyes see this? Thanks!
Just adding to the discussion. For the GG release we did a lot of benchmarks and basically this is what was used in the fragment insertion paper. However, AFAIK, such benchmarks have not been done in SILVA so it will be great if someone actually did these benchmarks, in case @rachaellappan is interested.
regarding benchmarks: there is already a lot of infrastructure in place, for example the wonderful repo https://github.com/caporaso-lab/tax-credit-data/ which I used a couple of month ago to add SEPP as another tool to assign taxonomy and of course all the notebooks I used for our paper https://msystems.asm.org/content/3/3/e00021-18
I think we should first provide the necessary changes for SEPP to deal with different references before we think too hard about benchmark results.
I'll argue that having them at the same time would be great; as you can imagine, once it's out there, it's out there and in the case there is a bug or something wrong that wasn't caught cause there were no benchmarks, it can get ugly ... my 2 pesos!
Hi @thermokarst, I will post to the QIIME2 forum. I would like to help out but I'm not very familiar with what is being done here and whether these steps are all that's required.
If I understand correctly, I agree that benchmarking SILVA (to demonstrate/confirm the improvement that fragment insertion offers over de novo trees in the case of SILVA?) would be ideal to do around the same time as providing v132 for SEPP. The SILVA aligned rep set doesn't specify whether it's 16S or 18S - does it contain both? - so the results may be different to GG.
I'm probably not the person to do this - no experience with benchmarking =)
The file used for SILVA package is described here: https://github.com/smirarab/sepp-refs/blob/master/silva/README.md
It was called SILVA_128_QIIME_release/rep_set_aligned/99/99_otus_aligned.fasta.gz
Does anyone know if that file did or did not include 18S?
On Tue, Jan 15, 2019 at 5:18 PM Rachael Lappan notifications@github.com wrote:
Hi @thermokarst https://github.com/thermokarst, I will post to the QIIME2 forum. I would like to help out but I'm not very familiar with what is being done here https://github.com/smirarab/sepp-refs/tree/master/silva and whether these steps are all that's required.
If I understand correctly, I agree that benchmarking SILVA (to demonstrate/confirm the improvement that fragment insertion offers over de novo trees in the case of SILVA?) would be ideal to do around the same time as providing v132 for SEPP. The SILVA aligned rep set doesn't specify whether it's 16S or 18S - does it contain both? - so the results may be different to GG.
I'm probably not the person to do this - no experience with benchmarking =)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/qiime2/q2-fragment-insertion/issues/21#issuecomment-454614325, or mute the thread https://github.com/notifications/unsubscribe-auth/AAybuFARRxUMCxOBEQsFkkErQ911HdoBks5vDn3rgaJpZM4Qi1J8 .
-- Siavash Mirarab
Hey there @rachaellappan --- we would love to get some help with this task - are you interested? If you don't have the bandwidth, maybe you could cross-post this request to the QIIME 2 Forum, that way more eyes see this? Thanks!
In case this hasn't been done yet, I would be glad to pitch in. But I would need the scripts required to process the QIIME formatted SILVA file (SILVA_132_QIIME_release/rep_set_aligned/99/99_alignment.fna)
Can anyone confirm if these modified steps would be right (taken from https://github.com/smirarab/sepp-refs/tree/master/silva)?
99_alignment.fna has 425098 sequences run_seqtools.py -masksites 2125 -infile 99_alignment.fna -outfile 99_alignment_masked.fna nw_topology -bI 99_otus.tre > 99_otus_nice.tree raxmlHPC-PTHREADS -s 99_alignment_masked.fna -m GTRCAT -n scoreF-99_alignment_masked.fna-g 99_otus_nice.tree -F -T 24 -p 8956 raxmlHPC-PTHREADS -s 99_alignment_masked.fna -m GTRCAT -n score-bl-99_alignment_masked.fna -F -f e -t RAxML_result.scoreF-99_alignment_masked.fna -T 24 -p 10625
Is this issue still alive?
Hi Aditya, yes it is still current, but maybe not too active at the moment. I am very busy meeting important deadlines until mid of March. Thereafter, this is on my to do list and help is extremely welcome; since I think this issue is a show stopper for many application scenarios.
Hi Stefan
Sure. I was wondering if I can get started on this at my end since its a heavy compute. All I would need is if someone can confirm the steps that need to be run.
Ofcourse, I will share the files for review once done and perhaps that would be mid-March already
All I know about Silva is what Siavash did to convert / prepare the data vor Silva 12.8: https://github.com/smirarab/sepp-refs/tree/master/silva Maybe you can induce if you are dealing with the correct files?
Yes, Stefan, I went through what Siavash had done and am sure I have the correct files with me. I wasn't entirely clear though how the masksites parameter was chosen for the first step. That's where I need some advise as the total number of sequences is different for v132
Perhaps @smirarab can pitch in?
ups, now I see that you already pointed to this link. Sorry for not paying enough attention :-/
Any updates on this, we are well past mid march?
Hi Aditya,
fair point. Sorry for the delay. I started working on SEPP itself to add the ability to easily change reference in an convenient way for QIIME2 users. This procedure should include a) adding SEPP to a CI system (Travis) b) update code style c) add ability to pass info files to sepp binaries d) package SEPP as a bioconda recipe. I am happy to receive some code reviews https://github.com/smirarab/sepp/pull/41 and thus increase visibility and quality.
I just downloaded the 3 GB of Silva's QIIME compatible version 13.2 https://www.arb-silva.de/fileadmin/silva_databases/qiime/Silva_132_release.zip I am pretty confident that the alignment file is SILVA_132_QIIME_release/rep_set_aligned/99/99_alignment.fna.zip
and the matching phylogeny is SILVA_132_QIIME_release/trees/99/99_otus.tre
. Both hold the very same 425,098 identifiers.
I figure you already know the right computational steps to perform, but I am not totally sure if the numeric parameters will also work for the slightly larger 13.2 release. Guess we will learn that the hard way :-/
Aditya,
Sorry for the long silence on this.
The steps you mentioned are mostly correct. However, in the end, you need to root the tree at the LCA of Archea.
Hope this helps.
Regards Siavash
On Tue, Mar 5, 2019 at 11:10 AM Aditya Bandla notifications@github.com wrote:
Yes, Stefan, I went through what Siavash had done and am sure I have the correct files with me. I wasn't entirely clear though how the masksites parameter was chosen for the first step. That's where I need some advise as the total number of sequences is different for v132
Perhaps @smirarab https://github.com/smirarab can pitch in?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/qiime2/q2-fragment-insertion/issues/21#issuecomment-469818614, or mute the thread https://github.com/notifications/unsubscribe-auth/AAybuJPW0rEM_Xbka7U5Jo46o_xMOLVNks5vTsEVgaJpZM4Qi1J8 .
-- Siavash Mirarab
I am trying to create a bioconda recipe for Siavash's SEPP program (without the heavy sized reference files) to support - in the long run - different references like Silva or others. Currently, I fail linting of the recipe, since I don't know how to properly deal with the situation that python is in principle platform independent, but SEPP ships pre-compiled platform dependent binaries. Can someone please help, maybe @thermokarst or @ebolyen ?
Is this something being still considered?
The bioconda package has been created: https://anaconda.org/bioconda/sepp (without reference files), but is not yet integrated into Qiime2.
Stefan, thats great to hear. Are the updated reference files for SILVA available as well?
Hi @adityabandla,
files for Silva 12.8 (phylogeny, alignment and info) are shipped with the default Qiime2 install and should be located in $CONDA_PREFIX/share/fragment-insertion/ref
(activate your conda environment first such that CONDA_PREFIX points to the right directory).
Did you succeed in creating a reference for Silva 13.2? If so, would you be willing to share those files with me / the Qiime community?
My PR #32 contains necessary updates for the qiime2 wrapper to cope with the new parameter for the info file, but it is still not merged into master. Thus, to use other references than Greengenes 13.8 you either have to overwrite the info file each time or use the run-sepp.sh script directly.
Best, Stefan
Hi Stefan
Sorry, I never managed to get to it. I just started and I ran into this error with the very first step
Traceback (most recent call last):
File run_seqtools.py", line 7, in <module> exec(compile(f.read(), __file__, 'exec'))
File "run_seqtools.py", line 36, in <module> alg.read_file_object(args.infile,args.informat)
File "alignment.py", line 1335, in read_file_object for name, seq in read_func(file_obj):
File "alignment.py", line 75, in read_fasta raise Exception("Error: illegal characeters in sequence at line %d" % line_number)
Exception: Error: illegal characeters in sequence at line 1
Hi @adityabandla I would need much more information about what you are trying to execute to be able to help debugging.
I am trying to run the following command when I get that error
run_seqtools.py -masksites 2125 -infile 99_alignment.fna -outfile 99_alignment_masked.fna
Please let me know if you need additional details
Aditya, is there a place where I can access the 99_alignment.fna file? I can try to have a look.
On Mon, Jun 24, 2019 at 9:24 PM Aditya Bandla notifications@github.com wrote:
I am trying to run the following command when I get that error run_seqtools.py -masksites 2125 -infile 99_alignment.fna -outfile 99_alignment_masked.fna
Please let me know if you need additional details
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/qiime2/q2-fragment-insertion/issues/21?email_source=notifications&email_token=AAGJXOD46WMM3QF3AVTBPFTP4GMWFA5CNFSM4EELKJ6KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYO6R2Q#issuecomment-505276650, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGJXOFEBYBH3TJIXUCFTWLP4GMWFANCNFSM4EELKJ6A .
-- Siavash Mirarab
@smirarab Siavash, its the file I downloaded from the SILVA website, https://www.arb-silva.de/fileadmin/silva_databases/qiime/Silva_132_release.zip, the particular file being SILVA_132_QIIME_release/rep_set_aligned/99/99_alignment.fna.zip
@adityabandla @smirarab is there any progress on using silva 132 ?
I am starting to work on this. Does anyone know if unaligned sits (alignment sites with a dot) should be removed?
On Tue, Nov 5, 2019 at 8:02 AM Ryszard Kubinski notifications@github.com wrote:
@adityabandla https://github.com/adityabandla @smirarab https://github.com/smirarab is there any progress on using silva 132 ?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/qiime2/q2-fragment-insertion/issues/21?email_source=notifications&email_token=AAGJXOGQQ3OVUKMBMOX5D5LQSGKJLA5CNFSM4EELKJ6KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDDGO6Y#issuecomment-549873531, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGJXOAXUKZU4GEQJEA4TALQSGKJLANCNFSM4EELKJ6A .
-- Siavash Mirarab
I have been working on this and now have the trees. I am having trouble with rooting the tree. There are several problematic taxa, mentioned below.
Is anyone more familiar with SILVA able to advise what's best to do here? Should we just remove these? Are they simply missclassified? Or perhaps I am using the wrong taxonomy file (SILVA_132_QIIME_release/taxonomy/taxonomy_all/99/raw_taxonomy.txt)?
On Mon, Nov 18, 2019 at 8:35 AM siavash mirarab smirarab@gmail.com wrote:
I am starting to work on this. Does anyone know if unaligned sits (alignment sites with a dot) should be removed?
On Tue, Nov 5, 2019 at 8:02 AM Ryszard Kubinski notifications@github.com wrote:
@adityabandla https://github.com/adityabandla @smirarab https://github.com/smirarab is there any progress on using silva 132 ?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/qiime2/q2-fragment-insertion/issues/21?email_source=notifications&email_token=AAGJXOGQQ3OVUKMBMOX5D5LQSGKJLA5CNFSM4EELKJ6KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDDGO6Y#issuecomment-549873531, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGJXOAXUKZU4GEQJEA4TALQSGKJLANCNFSM4EELKJ6A .
-- Siavash Mirarab
-- Siavash Mirarab
@smirarab Your question is also related to mine: https://github.com/smirarab/sepp-refs/issues/2. In SILVA 128, the FASTA file has dots too. Do you know the solution to make run_seqtools.py working?
In answered your questions there. The issue here has to do with the tree topology.
Any updates on this issue? Thanks!
I have the trees needed, but I have issues with rooting it, as mentioned above. I remain hopeful that someone with more familiarity with SILVA can tell me how the rooting issue should be dealt with.
On Tue, Jun 30, 2020 at 8:16 AM ETaSky notifications@github.com wrote:
Any updates on this issue? Thanks!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/qiime2/q2-fragment-insertion/issues/21#issuecomment-651860225, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGJXOHVG6D3YYFTVMKRMPLRZH6V7ANCNFSM4EELKJ6A .
-- Siavash Mirarab
@smirarab the first sequence seems to be anomalous on the first view, so it might be good to exclude it. For the other sequences, I checked some of the accession numbers and they are from genome or WGS sequence set entries. Those entries, sometimes contain contaminations from different domains. I am pretty sure that this is the case here. I think we should discuss how the sequences that are included in the tree are selected and if that can be optimised to leave this problematic sequences out. By the way, the current SILVA release is 138.1.
I am not familiar with QIIME, the fragment placing plugin or SEPP. I think the easiest approach would be that you send an email to our support email address (contact(at)arb-silva.de) giving us a short summary what data is need and how it is compiled and which issues you have (maybe there are more than just the routing of the trees?). With that information we then will try to help you solving the issues you are facing. We would also like to host the reference files on the SILVA website and see if we can find a way to automatically generate them with new SILVA releases, if possible.
All the best Jan from the SILVA team
Hi Jan,
I will initiate an email.
Thanks Siavash
On Wed, Nov 25, 2020 at 12:59 PM Jan notifications@github.com wrote:
@smirarab https://github.com/smirarab the first sequence seems to be anomalous on the first view, so it might be good to exclude it. For the other sequences, I checked some of the accession numbers and they are from genome or WGS sequence set entries. Those entries, sometimes contain contaminations from different domains. I am pretty sure that this is the case here. I think we should discuss how the sequences that are included in the tree are selected and if that can be optimised to leave this problematic sequences out. By the way, the current SILVA release is 138.1.
I am not familiar with QIIME, the fragment placing plugin or SEPP. I think the easiest approach would be that you send an email to our support email address (contact(at)arb-silva.de) giving us a short summary what data is need and how it is compiled and which issues you have (maybe there are more than just the routing of the trees?). With that information we then will try to help you solving the issues you are facing. We would also like to host the reference files on the SILVA website and see if we can find a way to automatically generate them with new SILVA releases, if possible.
All the best Jan from the SILVA team
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/qiime2/q2-fragment-insertion/issues/21#issuecomment-733942703, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGJXOAG2E5AH5DBMF24IVTSRVVZ3ANCNFSM4EELKJ6A .
-- Siavash Mirarab
Any update on a SLIVA reference database formatted for SEPP through qiime2?
not that I am aware of, unfortunately
Improvement Description It should be possible to download the QIIME compatible version of Silva and construct reference phylogeny and alignment for SEPP to enable 18S analyses.
Questions
@josenavas @wasade do you know if release 128 is the latest?
How and where would we host SEPP compatible references? Within this Plugin (which is already 130 MB large), on the github repo?