Closed ChristelKrueger closed 1 year ago
hi, Let me tag @DongzeHE who wrote loadFry()
Thank you! :-)
Helloe @ChristelKrueger,
Sorry, I missed this message!
Alevin-fry USA mode works as the following:
t2g_3col.tsv
. All genes that show up in the second column of this file should exist in the final count matrix. So, if a gene is not in the final count matrix, I would suspect that gene is in the t2g_3col.tsv
file. Therefore, to answer your question, could you please tell me where did you get the gene_ids
? Could you check if the missing genes are in the t2g_3col.tsv
file?
Thanks, Dongze
Thank you @DongzeHE for these explanations - it helped me to understand what had gone wrong. I had been experimenting with the counts outside the alevin folder structure but had made a silly mistake while copying (which took me an embarrassingly long time to realise ...). Sorry about this - entirely my bad! Done correctly, the numbers do add up as expected.
Thank you for making Fishpond! I have been using the loadFry function to aggregate USA counts produced by Alevin Fry (adding up U+S+A). I would have expected that the summarised counts table would have a third of the gene_ids but actually there are fewer. The only filtering I found in the documentation was nonzero but that defaults to FALSE. Looking up some of the ENSGs that are missing from the collated output, it seems that they are pseudogenes. Is this some additional filtering that loadFry does?