Closed paulocaldas closed 3 years ago
The issue is how DESeq2 is handling your data (hence, why I wanted you to open up a separate thread). Can you please copy and paste the output?
the output from the ARTDeco -mode diff_exp_dogs? I basically get this error and empty folders ...
The error message. Can you copy and paste that? It makes it easier for me to examine it than a screenshot.
sure. here it goes
command line: (ARTDeco) pcaldas@HP:/mnt/data/CREMCOMICSdo001/pcaldas/ctcf_human_artdeco/artdeco.Param2⟫ ARTDeco -mode diff_exp_read_in -meta-file input/dea_meta.txt -bam-files-dir mapping/
output: Running diff_exp_read_in mode... Loading ARTDeco file structure... Reformatted meta file exists... Reformatted comparisons file exists... ARTDeco will generate the following files: ./diff_exp_read_in/control-ctcf_kd-read_in_assignment.txt ./diff_exp/control-ctcf_kd-results.txt ./diff_exp_read_in/control-ctcf_kd-read_in.txt Running DESeq2 on gene expression data... /mnt/data/CREMCOMICSdo001/comics/miniconda2/envs/ARTDeco/lib/python3.6/site-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: Error in (function (countData, colData, design, tidy = FALSE, ignoreRank = FALSE, : ncol(countData) == nrow(colData) is not TRUE
warnings.warn(x, RRuntimeWarning) Combining differential expression results and read-in information... Inferring read-in genes for upregulated genes with log2 fold change > 2, p-value < 0.05, and FPKM > 0.25... Read-in level threshold is -1... Using all genes...
Perfect. Thank you. I'll take a look.
Can you copy and paste your meta file as well as the first few rows (via a head command) of your gene expression file?
my metadata file only has two lines for now, bc I was testing the program before using too many samples
Experiment Group SRR1657556_Aligned_sorted control SRR1657557_Aligned_sorted ctcf_kd
and my gene.exp.raw.txt file looks like this
ID Length SRR1657556_Aligned_sorted SRR1657557_Aligned_sorted ENST00000623944.1 1027.0 198.5 113.5 ENST00000555087.1 78686.0 146.0 167.0 ENST00000587595.1 29520.0 15.0 22.5 ENST00000323055.10 324040.0 18426.5 20380.5 ENST00000607561.1 107.0 0.5 1.5 ENST00000567827.1 10183.0 186.5 183.5 ENST00000548247.1 27880.0 359.5 376.5 ENST00000357771.5 7518.0 88.5 83.0 ENST00000425674.1 37855.0 817.0 883.0
This might be a dumb question, but is your meta file tab-delimited?
so I kinda overcome the error, but I'm not sure why ...
I just delete all my files and started over. but yes, my files were tab delimited, so I'm not really sure what was the problem. I got my control-experiment-results.txt inside a diff_exp_dogs folder now. Sorry that my troubleshooting was not really helpful.
although, one thing that I noticed was that by running the general command with the meta file option
ARTDeco -mode get_dogs -gtf-file input/modified_annotation_file.gtf -chrom-sizes-file input/ref_genome.chrom.sizes -home-dir $PWD -bam-files-dir mapping/ -cpu 16 -min-dog-len 3000 -dog_window 300 -min_dog_coverage 0.3 -meta-file input/dea_meta.txt
this file is not created (differential analysis does not run). The analysis only worked when I used the diff_exp_dogs mode
ARTDeco -mode diff_exp_dogs -meta-file input/dea_meta.txt -bam-files-dir mapping/
I'm confused what you are trying to note. Your initial command is for the diff_exp_read_in mode and this command is the diff_exp_dogs mode. You are changing run modes so it's impossible to know if you fixed the problem.
What is your desired task because you seem to be changing it?
maybe I'm mixing things up. My goal was to run differential gene expression analysis to check if the genes that are have transcription readthrough (dogs) are highly expressed in my experiments vs. control. I understood that by including the metafile in the first command line it would run diff_exp_dogs mode automatically.
I would recommend not specifying a mode if you want all possible outputs. What you want is to look at the diff_read_in folder and only consider the differential expression information. From that you can get the differentially expressed genes. Then, get the readthrough levels from readthrough.txt. This is the best way to get the information. I'm personally not a fan of DoGs as a measure of readthrough. Readthrough levels are far more representative.
I would also generate a comparisons file to make your life easier. Ideally, your final command would be the following:
ARTDeco -gtf-file input/modified_annotation_file.gtf -chrom-sizes-file input/ref_genome.chrom.sizes -home-dir $PWD -bam-files-dir mapping/ -cpu 16 -meta-file input/dea_meta.txt -comparsions-file comparisons.txt
The get_dogs mode will not generate the diff_exp_dogs folder. The output structure is explained in detail in the readme.
Does this help?
That is fact a discussion we're having in our group, what would be the best way to characterize readthrough. But either way, that was super helpful. I think everything is working for me now. I just generated all the outputs without any problem. Thanks a lot for all the quick feedback during the day!
btw, I've been playing with other tools, including DoGFinder, for the last few days, and I really enjoy how straightforward it is to run your tool without any issues. ARTDeco is really handy! Cheers!
Apologies for the late response on this comment. I have been visiting family the past few days. I am glad that you find ARTDeco to be straightforward because that was one of my main goals.
It's my belief from experience that readthrough level is the best single measure of transcriptional readthrough based upon experience and its flexibility. The single best use for DoGs is to identify major regions of readthrough for visualization. That is what I use them for all the time.
Hey, First I wanted to thank you for writing such a useful tool!
I have some issues with getting the DoG differential expression, and the error is similar to the one in this issue.
I have tried both running in 'diff_exp_dogs' mode and without any specified mode, however I get no output, just this error.
DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_memory=False. sys.exit(load_entry_point('ARTDeco==0.4', 'console_scripts', 'ARTDeco')()) /localenv/balan/miniconda3/envs/ARTDeco/lib/python3.6/site-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: Error in (function (countData, colData, design, tidy = FALSE, ignoreRank = FALSE, : ncol(countData) == nrow(colData) is not TRUE
warnings.warn(x, RRuntimeWarning)
Could you please point me in the right direction?
I would mention that both 'preprocess' and the 'get_dogs' modes work without error.
All the best, Mirela
Can you please open a new issue for this?
everything works fine without problems, except when I try the differential expression analysis mode. I understand that is something wrong with the number and/or file names in my metafile vs. experiments being considered for the analysis, but I cannot figure out the source of the error.