Closed wasade closed 2 weeks ago
I actually like this; should we just then remove the "Alignment profile" artifact? @qiyunzhu, what do you think?
Sorry slightly misspoke, coverages is also in per genome predictions with the same checksum so it may be replicated information
That's correct for coverages, they are copied to all artifacts. However, I think we can move the alignment.tar file to the per-genome table; if that's agreeable.
:+1: on mv'ing alignment data but why retain a duplicate of the coverage data?
Well, the plan is that one day we will be able to merge tables for meta-analysis and at the same time merge the coverage data on the fly. To allow this, the easiest is to have the coverage data living within the main biom for all tables. The good thing is that this file is small (compared to everything else we are storing).
Why not just use a symlink? The files look like they're ~1GB too?
@antgonza @wasade I thought that the features of the "none.biom" table are genome IDs, not lineage strings. Are they? This is the single most important output of Woltka (i.e., the OGU table). I am totally fine with removing Alignment Profile because the name and its content seem not matching according to your description. I think that "alignment.tar" can be separate from any of the BIOM tables, because it is the output of Bowtie2 and input of Woltka, which leads to all tables. Logically it should be separate.
FWIW, this is being addressed here: https://github.com/qiita-spots/qp-woltka/pull/30
Currently, the coverage and alignment data are associated with the "Alignment profile" artifact, which contains a "none.biom" table. The features of the "none.biom" table are lineage strings, but the identifiers in coverage and the alignment data are the genome IDs. The feature table relevant to the coverage and alignment data are stored in a separate artifact "Per genome Predictions".
It would be helpful if the coverage and alignment data were to co-located with the corresponding feature table.