Closed szhan closed 1 week ago
Hi,
The "projection" documentation about its output files is here: https://ppanggolin.readthedocs.io/en/latest/user/projection.html#output-files
However, indeed it is right that the current behavior is not the one that was intended. I see where the bug is. Currently, the "gene_to_gene_family.tsv" file contains this information for ALL given input genomes, and not just the single input genome. The file is likely equal between the different "input genome" output directories. we'll get a fix for this in the upcoming version.
Thank you very much for the bug report.
Adelme
Thank you for the explanation. I checked whether "The file is likely equal between the different "input genome" output directories" for a few input genomes. But it didn't seem to be the case. I look forward to the updated version. Thank you.
Also, I was referring to https://github.com/labgem/PPanGGOLiN/blob/f3ba6a1f33256f19175b570c4b711bb8970d0365/docs/user/Outputs.md#gene-families-and-genes, which doesn't seem to exist anymore, in https://github.com/labgem/PPanGGOLiN/blob/f3ba6a1f33256f19175b570c4b711bb8970d0365/docs/user/projection.md
Alright thank you for the additional input, and indeed I misunderstood what you meant, I see the broken link now ! Will fix this as well.
The fix for this issue has been released in v2.1.0.
I have been running
projection
on a reconstructed pangenome and a set of assembly FastA files for input genomes, in order to assign each gene to a gene family in the pangenome for each input genome.I tried consulting the documentation about the output of
projection
, but the link doesn't seem to go anywhere (https://github.com/labgem/PPanGGOLiN/blob/f3ba6a1f33256f19175b570c4b711bb8970d0365/docs/user/projection.md).The documentation states that
gene_to_gene_family.tsv
"provides the mapping of genes to gene families of the pangenome." I was expecting to see one line per gene for an input genome, which indicates that the gene in a line is assigned to a gene family in the reconstructed pangenome. But this isn't what I got. Instead, I got files with 100s of thousands of lines, even though an input genome contains 2.5k to 2.9k genes.Any clarifications would be much appreciated. Thank you in advance.