kfuku52 / amalgkit

RNA-seq data amalgamation for a large-scale evolutionary transcriptomics
BSD 3-Clause "New" or "Revised" License
7 stars 1 forks source link

Species.tau file has NAs #85

Closed docxology closed 2 years ago

docxology commented 2 years ago

After a amalgkit run with 3 samples excluded, the .tau output file gives a "highest" and "order", but the numerical tau value is NA.

Example below.

tau | highest | order |   -- | -- | -- | -- lcl\|NC_037638.1_cds_XP_623975.1_1 | NA | embryo_W | embryo_W\|ovary_W\|midgut_W\|embryo_M\|hypopharyngeal_glands_W\|larval_gut_W\|mandibular_gland_W\|nasonov_gland_W\|skeletal_muscle_W\|malpighian_tubule_W\|adipose_Q\|head_and_thorax_Q\|adipose_W\|sting_gland_W\|antennae_W\|second_thoracic_ganglia_W\|brain_Q\|mushroom_bodies_M\|brain_W\|mandibular_gland_Q\|mushroom_bodies_W lcl\|NC_037638.1_cds_XP_623952.2_2 | NA | embryo_M | embryo_M\|ovary_W\|nasonov_gland_W\|malpighian_tubule_W\|midgut_W\|adipose_Q\|mandibular_gland_W\|hypopharyngeal_glands_W\|antennae_W\|sting_gland_W\|brain_Q\|second_thoracic_ganglia_W\|larval_gut_W\|skeletal_muscle_W\|mushroom_bodies_M\|brain_W\|mushroom_bodies_W\|embryo_W\|adipose_W\|mandibular_gland_Q\|head_and_thorax_Q lcl\|NC_037638.1_cds_XP_006557469.1_3 | NA | embryo_M | embryo_M\|embryo_W\|antennae_W\|midgut_W\|second_thoracic_ganglia_W\|nasonov_gland_W\|hypopharyngeal_glands_W\|skeletal_muscle_W\|mandibular_gland_W\|malpighian_tubule_W\|adipose_Q\|sting_gland_W\|larval_gut_W\|mushroom_bodies_M\|ovary_W\|brain_Q\|brain_W\|mandibular_gland_Q\|head_and_thorax_Q\|adipose_W\|mushroom_bodies_W lcl\|NC_037638.1_cds_XP_006557467.1_4 | NA | embryo_M | embryo_M\|embryo_W\|midgut_W\|hypopharyngeal_glands_W\|mandibular_gland_Q\|larval_gut_W\|brain_W\|mushroom_bodies_W\|adipose_W\|sting_gland_W\|head_and_thorax_Q\|ovary_W\|second_thoracic_ganglia_W\|mushroom_bodies_M\|malpighian_tubule_W\|nasonov_gland_W\|adipose_Q\|mandibular_gland_W\|skeletal_muscle_W\|antennae_W\|brain_Q
Hego-CCTB commented 2 years ago

Did amalgkit finish without errors? Do other amalgkit outputs of this run OK, or do they have problems as well?

docxology commented 2 years ago

Amalgkit finished with no errors.

Yes I have done other runs where the .tau calculations were numerical and complete.

Hego-CCTB commented 2 years ago

How about the species.curate_group.mean.tsv ? Do they look "normal" (like, all tissues there, values reasonable, etc.)?

docxology commented 2 years ago

"How about the species.curate_group.mean.tsv ? Do they look "normal" (like, all tissues there, values reasonable, etc.)?" -- Yes, this file looks normal.

Hego-CCTB commented 2 years ago

@C20H25N30 I think I found the cause of this, but I'm still investigating where and why it happens. This is specific to your dataset, so let's continue this discussion via mail.

Hego-CCTB commented 2 years ago

This issue was caused by all samples of a certain curate_group (i.e. tissue) being marked for exclusion, which caused problems with the tau calculation further down the line.

I have added a failsave that works like this:

  1. print warning message that all samples of a tissue have been marked for exclusion
  2. continue the algorithm without this tissue

updated in amalgkit ver. 0.5.1: https://github.com/kfuku52/amalgkit/commit/0bf4fc3083336883b3aab8aed87b1f1f0e9dba99