Closed shrutikhare-git closed 2 years ago
Hi @shrutivijayk,
There is a minimum number (and type) of variants that are required in order to proceed with fitting the fitness error model.
The type corresponds to variants that:
The minimum number is defined as 30 * #experiment_replicates, so if you have 3 experiment_replicates you need at least 90 variants that are observed at least once in all samples and have input read counts above the threshold defined in [2] above.
The error you are getting means that currently the data is not meeting these requirements and this normally happens when there are very few variants retained from previous stages.
You can check the output of stage 4, specifically the read count table in the file "DiMSum_Project_variant_data_merge.tsv" (https://github.com/lehner-lab/DiMSum/blob/master/docs/FILEFORMATS.md#output-files) to see how many variants have been retained, what their counts are in different samples and whether this is what you expected. You can attach it here if small enough or send a link to share it from some other location if you like.
Hope this helps!
Thanks for your reply. I have submitted my own VariantCount file to DiMSum for this run. It has ~700 variants (synonymous+nonsynonymous) which are present at least once in all 6 conditions (3 inputs and 3 outputs from 3 biological replicates). I used no explicit cutoffs for fitnessMinInputCountAll etc. so default should be 0 and this should not remove any variants.
I have attached the file here. First row includes WT counts. (sorry, had to remove sequence information as our data is not yet published). Thanks for your help.
Hi @shrutivijayk,
I had a look and this first plot below is fitness vs input read count (log scale) for your data with the threshold chosen by DiMSum indicated with the vertical dashed line:
You can see that very few variants satisfy this threshold and in general fitness is quite strongly anti-correlated with input counts - this suggests you probably need to sequence deeper.
This is the same plot for an example of another dataset with more sufficient sequencing data - above a certain input read count threshold, fitness is "well-behaved" i.e. no obvious correlation with input read counts (which is desirable of course if we want the estimates to report on selection and not simply abundance in the input):
Hope this helps!
(I'm going to close this issue now because I think it is clear that this is not an issue with DiMSum but rather an issue with the data you are trying to analyse.)
Thank you. The experiment is not a growth based assay. Only a subset of mutants are expected to survive/get enriched upon selection. Can DiMSum be used to analyse such data? Is there any way to edit the input variant count cutoff DiMSum is using?
Hi, I am the same user who had issues regarding absence of WT sequence in the data. I have now reformatted the mutant counts file I had from my own pipeline and used it as VariantCount.txt. I am now getting the following error:
DiMSum STAGE 5 (STEAM): ANALYSE VARIANT COUNTS
Filtering out low count nucleotide variants... Done Aggregating counts for biological output replicates... Done Fit error model... Error: Cannot proceed with error modelling: insufficent number of variants satisfying full fitness range Execution halted
What does this mean and how can I solve it? Thank you.