Open sparthib opened 5 days ago
It is documented in create_config
:
> ?FLAMES::create_config
...
min_sup_cnt - Minimum number of read support an isoform
decrease this number will significantly increase the
number of isoform detected.
min_cnt_pct - Minimum percentage of count for an isoform
relative to total count for the same gene.
min_sup_pct - Minimum percentage of count for an splice chain
that support a given transcript start/end site
combination.
I believe transcript_count.bad_coverage.csv.gz
keeps track of alignments with coverage less than min_tr_coverage
, i.e. your read aligned to transcript A but only covers it less than min_tr_coverage
. I am adding oarfish as an optional quantification method, which will hopefully give better counts as it will attempt to allocate the ambiguous alignments .
https://github.com/COMBINE-lab/oarfish
I am not sure I understand what you mean by multiple sequencing depths, do you have multiple samples with different sequencing depth?
Thank you! Yes, I have samples with different sequencing depths. Is it possible to change the cutoff from absolute count to perhaps TPM? Although I am not sure how much difference this would make for isoforms with extremely small number of reads aligned to them.
Thanks! Sowmya
Thank you! Yes, I have samples with different sequencing depths. Is it possible to change the cutoff from absolute count to perhaps TPM? Although I am not sure how much difference this would make for isoforms with extremely small number of reads aligned to them.
Thanks! Sowmya
Not at the moment, it does sound reasonable, maybe we could update it.
Hi there,
I am running the FLAMES single cell pipeline, I was wondering if I could get clarification on how the
min_sup_cnt
parameter affects samples of different sequencing depths. For example, when I ran it under default setting, i.e.min_sup_cnt = 5
, smaller replicates had less unique isoforms in thetranscript_counts
matrix as opposed to bigger replicates. This makes sense, I changed the setting to 2 to see if the number of isoforms in the final output (transcript_count.csv.gz
) increases , and it did, however, the number of isoforms in thetranscript_count.bad_coverage.csv.gz
also increases, while I had imagined it would be the opposite, since with a more lenient threshold for read count I would want the bad coverage counts matrix to contain lesser number of isoforms. I guess my question is, what exactly does thetranscript_count.bad_coverage.csv.gz
keep track of? Also, if you have suggestions for a more flexible setting other thanmin_sup_cnt
to account for multiple sequencing depths, that would be great.Thanks,
Sowmya