kfuku52 / amalgkit

RNA-seq data amalgamation for a large-scale evolutionary transcriptomics
BSD 3-Clause "New" or "Revised" License
7 stars 1 forks source link

.tsv's have columns shifted by one in amalgkit output #86

Closed docxology closed 2 years ago

docxology commented 2 years ago

As per the https://github.com/kfuku52/amalgkit/issues/85 table copied in from Species.tau.tsv -- I find that in /curate/tables/, some TSV files have a deleted first column name (should be e.g. "locus_ID" or "gene_ID"), and all other columns are shifted over. This causes improper behavior for downstream pipelines and my fix has only been to manually add in a column name for gene_ID and shift other columns to the right.

This error in the TSV is seen in the following files in Tables/ Species.tau Species.tc Species.uncorrected.curate_group.mean Species.uncorrected.tc Species.curate_group.mean

The .sva and .sra TSV column names are fine.

Hego-CCTB commented 2 years ago

Ah, this is probably because they are matrices with the genename as a row identfifier, rather than being an entry in a column. So when they are saved by R, they don't get a column name and everything shifts over.

I'll take care of this!

docxology commented 2 years ago

Thank you @Hego-CCTB !

Hego-CCTB commented 2 years ago

Added a "GeneID" column name to the .tsv outputs in amalgkit ver. 0.5.1 https://github.com/kfuku52/amalgkit/commit/0bf4fc3083336883b3aab8aed87b1f1f0e9dba99