sjroth / ARTDeco

MIT License
15 stars 7 forks source link

Count error in diff_exp_dogs #7

Closed paulocaldas closed 3 years ago

paulocaldas commented 3 years ago

everything works fine without problems, except when I try the differential expression analysis mode. I understand that is something wrong with the number and/or file names in my metafile vs. experiments being considered for the analysis, but I cannot figure out the source of the error.

image

sjroth commented 3 years ago

The issue is how DESeq2 is handling your data (hence, why I wanted you to open up a separate thread). Can you please copy and paste the output?

paulocaldas commented 3 years ago

the output from the ARTDeco -mode diff_exp_dogs? I basically get this error and empty folders ...

sjroth commented 3 years ago

The error message. Can you copy and paste that? It makes it easier for me to examine it than a screenshot.

paulocaldas commented 3 years ago

sure. here it goes

command line: (ARTDeco) pcaldas@HP:/mnt/data/CREMCOMICSdo001/pcaldas/ctcf_human_artdeco/artdeco.Param2⟫ ARTDeco -mode diff_exp_read_in -meta-file input/dea_meta.txt -bam-files-dir mapping/

output: Running diff_exp_read_in mode... Loading ARTDeco file structure... Reformatted meta file exists... Reformatted comparisons file exists... ARTDeco will generate the following files: ./diff_exp_read_in/control-ctcf_kd-read_in_assignment.txt ./diff_exp/control-ctcf_kd-results.txt ./diff_exp_read_in/control-ctcf_kd-read_in.txt Running DESeq2 on gene expression data... /mnt/data/CREMCOMICSdo001/comics/miniconda2/envs/ARTDeco/lib/python3.6/site-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: Error in (function (countData, colData, design, tidy = FALSE, ignoreRank = FALSE, : ncol(countData) == nrow(colData) is not TRUE

warnings.warn(x, RRuntimeWarning) Combining differential expression results and read-in information... Inferring read-in genes for upregulated genes with log2 fold change > 2, p-value < 0.05, and FPKM > 0.25... Read-in level threshold is -1... Using all genes...

sjroth commented 3 years ago

Perfect. Thank you. I'll take a look.

sjroth commented 3 years ago

Can you copy and paste your meta file as well as the first few rows (via a head command) of your gene expression file?

paulocaldas commented 3 years ago

my metadata file only has two lines for now, bc I was testing the program before using too many samples

Experiment Group SRR1657556_Aligned_sorted control SRR1657557_Aligned_sorted ctcf_kd

and my gene.exp.raw.txt file looks like this

ID Length SRR1657556_Aligned_sorted SRR1657557_Aligned_sorted ENST00000623944.1 1027.0 198.5 113.5 ENST00000555087.1 78686.0 146.0 167.0 ENST00000587595.1 29520.0 15.0 22.5 ENST00000323055.10 324040.0 18426.5 20380.5 ENST00000607561.1 107.0 0.5 1.5 ENST00000567827.1 10183.0 186.5 183.5 ENST00000548247.1 27880.0 359.5 376.5 ENST00000357771.5 7518.0 88.5 83.0 ENST00000425674.1 37855.0 817.0 883.0

sjroth commented 3 years ago

This might be a dumb question, but is your meta file tab-delimited?

paulocaldas commented 3 years ago

so I kinda overcome the error, but I'm not sure why ...

I just delete all my files and started over. but yes, my files were tab delimited, so I'm not really sure what was the problem. I got my control-experiment-results.txt inside a diff_exp_dogs folder now. Sorry that my troubleshooting was not really helpful.

although, one thing that I noticed was that by running the general command with the meta file option

ARTDeco -mode get_dogs -gtf-file input/modified_annotation_file.gtf -chrom-sizes-file input/ref_genome.chrom.sizes -home-dir $PWD -bam-files-dir mapping/ -cpu 16 -min-dog-len 3000 -dog_window 300 -min_dog_coverage 0.3 -meta-file input/dea_meta.txt

this file is not created (differential analysis does not run). The analysis only worked when I used the diff_exp_dogs mode

ARTDeco -mode diff_exp_dogs -meta-file input/dea_meta.txt -bam-files-dir mapping/

sjroth commented 3 years ago

I'm confused what you are trying to note. Your initial command is for the diff_exp_read_in mode and this command is the diff_exp_dogs mode. You are changing run modes so it's impossible to know if you fixed the problem.

sjroth commented 3 years ago

What is your desired task because you seem to be changing it?

paulocaldas commented 3 years ago

maybe I'm mixing things up. My goal was to run differential gene expression analysis to check if the genes that are have transcription readthrough (dogs) are highly expressed in my experiments vs. control. I understood that by including the metafile in the first command line it would run diff_exp_dogs mode automatically.

sjroth commented 3 years ago

I would recommend not specifying a mode if you want all possible outputs. What you want is to look at the diff_read_in folder and only consider the differential expression information. From that you can get the differentially expressed genes. Then, get the readthrough levels from readthrough.txt. This is the best way to get the information. I'm personally not a fan of DoGs as a measure of readthrough. Readthrough levels are far more representative.

I would also generate a comparisons file to make your life easier. Ideally, your final command would be the following: ARTDeco -gtf-file input/modified_annotation_file.gtf -chrom-sizes-file input/ref_genome.chrom.sizes -home-dir $PWD -bam-files-dir mapping/ -cpu 16 -meta-file input/dea_meta.txt -comparsions-file comparisons.txt

The get_dogs mode will not generate the diff_exp_dogs folder. The output structure is explained in detail in the readme.

Does this help?

paulocaldas commented 3 years ago

That is fact a discussion we're having in our group, what would be the best way to characterize readthrough. But either way, that was super helpful. I think everything is working for me now. I just generated all the outputs without any problem. Thanks a lot for all the quick feedback during the day!

btw, I've been playing with other tools, including DoGFinder, for the last few days, and I really enjoy how straightforward it is to run your tool without any issues. ARTDeco is really handy! Cheers!

sjroth commented 3 years ago

Apologies for the late response on this comment. I have been visiting family the past few days. I am glad that you find ARTDeco to be straightforward because that was one of my main goals.

It's my belief from experience that readthrough level is the best single measure of transcriptional readthrough based upon experience and its flexibility. The single best use for DoGs is to identify major regions of readthrough for visualization. That is what I use them for all the time.

FerallOut commented 10 months ago

Hey, First I wanted to thank you for writing such a useful tool!

I have some issues with getting the DoG differential expression, and the error is similar to the one in this issue.

I have tried both running in 'diff_exp_dogs' mode and without any specified mode, however I get no output, just this error.

DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_memory=False. sys.exit(load_entry_point('ARTDeco==0.4', 'console_scripts', 'ARTDeco')()) /localenv/balan/miniconda3/envs/ARTDeco/lib/python3.6/site-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: Error in (function (countData, colData, design, tidy = FALSE, ignoreRank = FALSE, : ncol(countData) == nrow(colData) is not TRUE

warnings.warn(x, RRuntimeWarning)

Could you please point me in the right direction?

I would mention that both 'preprocess' and the 'get_dogs' modes work without error.

All the best, Mirela

sjroth commented 10 months ago

Can you please open a new issue for this?