qiime2 / q2-fragment-insertion

BSD 3-Clause "New" or "Revised" License
13 stars 17 forks source link

Silva 2019 #63

Closed sjanssen2 closed 4 years ago

sjanssen2 commented 5 years ago

Since the original PR #32 is quite old, I am starting a new one here to declutter the introduced changes:

Necessary changed to support non default references (which is Greengenes 13.8) like Silva 12.8. I needed to add another parameter which is a filepath to an "info" file, which holds information about tree construction via RAxML from the multiple sequence alignment. This is used by pplacer to determine correct branch length for insertions. This PR is solving the technical issues of #21 However, it neither compiles current Silva reference files nor hosts them somewhere.

There are related posts in the forum.qiime2.org: https://forum.qiime2.org/t/making-rooted-silva-tree/11011/9 https://forum.qiime2.org/t/suggestion-silva-v132-for-q2-fragment-insertion/7844/5

sjanssen2 commented 5 years ago

Hi @thermokarst,

I thought about your comment to stop vendoring data #66 and still think about a good way to distribute multiple reference packages for SEPP (Greengenes 13.8 and Silva 12.8 at the moment). Thus, I created two bioconda packages (sepp-refgg138 and sepp-refsilva128) consisting only of those 4 and 3 files respectively.

The only issue is, that the non default reference package (silva) needs to be qza'ed by the user before they can use it - which might represent a severe obstacle. Thus, the build.sh script of q2-fragment-insertion imports those files into qza's. Therefore, sepp-refsilva128 is a build but not a run dependencies - whereas sepp-refgg138 is needed for build and run. Hope this solves all our issues.

sjanssen2 commented 5 years ago

ping @ebolyen @thermokarst any comments? I'd like to provide Silva and other alternative references soon. Can you give me a rough estimate of a time line? What are the chances to get shipped with qiime-2019.10 ? Thanks!

ebolyen commented 5 years ago

Hey @sjanssen2! Sorry for the delay, @thermokarst has been busy with a grant related project, I'm sure he'll poke his head in once he has some breathing room.

Our goal as I understand it is to get this into 2019.10, I think @thermokarst has some work in progress towards that end, but I'm not certain of the details.

thermokarst commented 5 years ago

Hi @thermokarst,

I thought about your comment to stop vendoring data #66 and still think about a good way to distribute multiple reference packages for SEPP (Greengenes 13.8 and Silva 12.8 at the moment). Thus, I created two bioconda packages (sepp-refgg138 and sepp-refsilva128) consisting only of those 4 and 3 files respectively.

The only issue is, that the non default reference package (silva) needs to be qza'ed by the user before they can use it - which might represent a severe obstacle. Thus, the build.sh script of q2-fragment-insertion imports those files into qza's. Therefore, sepp-refsilva128 is a build but not a run dependencies - whereas sepp-refgg138 is needed for build and run. Hope this solves all our issues.

Hey @sjanssen2! How about we just publish data.qiime2.org links to QZAs for these databases, and stick them up at https://docs.qiime2.org/2019.7/data-resources/? I would prefer not to couple these databases with the plugin, since this can cause a pretty significant bottleneck for use installs, integration testing, etc. As @ebolyen mentioned, I am working on a time-sensitive project right now, and will be preoccupied for the next week or so. Thanks!

thermokarst commented 4 years ago

Closing in favor of #66