sjroth / ARTDeco

MIT License
15 stars 7 forks source link

Directories are empty when running differential expression modes #14

Closed PacoRM24 closed 1 year ago

PacoRM24 commented 1 year ago

Hi again Samuel. I'm running all the modes and there isn't any error in the output, but the directories where the results of differential expression analysis should be are empty (diff_exp, diff_exp_read_in, diff_exp_dogs).

The commands are:

ARTDeco -mode preprocess -gtf-file $GTF_FILE -chrom-sizes-file $CHROM_SIZES -layout PE -stranded True -orientation Forward -meta-file $META_FILE ARTDeco -mode readthrough -gtf-file $GTF_FILE -layout PE -stranded True -orientation Forward ARTDeco -mode get_dogs -gtf-file $GTF_FILE -chrom-sizes-file $CHROM_SIZES -layout PE -stranded True -orientation Forward ARTDeco -mode diff_exp_read_in ARTDeco -mode diff_exp_dogs

The preprocess, readthrough and get_dogs modes have run successfully (all files are generated).

The outputs of the diff_exp_read_in and diff_exp_dogs modes are the following:

/share/apps/External/Python-3.6.6/lib/python3.6/site-packages/rpy2/robjects/pandas2ri.py:14: FutureWarning: pandas.core.index is deprecated and will be removed in a future version. The public classes are available in the top-level namespace. from pandas.core.index import Index as PandasIndex Running diff_exp_read_in mode... Loading ARTDeco file structure... Reformatted meta file exists... Reformatted comparisons file exists... ARTDeco will generate the following files: ./diff_exp_read_in/Control-TPL-read_in.txt ./diff_exp_read_in/THZ-TPL-read_in_assignment.txt ./diff_exp/THZ-TPL-results.txt ./diff_exp_read_in/Control-TPL-read_in_assignment.txt ./diff_exp ./diff_exp_read_in/THZ-TPL-read_in.txt ./diff_exp_read_in/Control-THZ-read_in.txt ./diff_exp_read_in/Control-THZ-read_in_assignment.txt ./diff_exp_read_in ./diff_exp/Control-TPL-results.txt ./diff_exp/Control-THZ-results.txt Creating differential expression output directory... Running DESeq2 on gene expression data... Creating differential expression with read-in information directory... Combining differential expression results and read-in information... Inferring read-in genes for upregulated genes with log2 fold change > 2, p-value < 0.05, and FPKM > 0.25... Read-in level threshold is -1... Using all genes... /share/apps/External/Python-3.6.6/lib/python3.6/site-packages/rpy2/robjects/pandas2ri.py:14: FutureWarning: pandas.core.index is deprecated and will be removed in a future version. The public classes are available in the top-level namespace. from pandas.core.index import Index as PandasIndex Running diff_exp_dogs mode... Loading ARTDeco file structure... Reformatted meta file exists... Reformatted comparisons file exists... ARTDeco will generate the following files: ./diff_exp_dogs ./diff_exp_dogs/THZ-TPL-results.txt ./diff_exp_dogs/Control-THZ-results.txt ./diff_exp_dogs/Control-TPL-results.txt Creating differential expression for DoGs directory... Running DESeq2 on DoGs...

Could you give me any idea why this is happening? Please.

sjroth commented 1 year ago

This looks like your meta file is messed up most likely. Can you show me the top line of the raw gene expression file and attach the meta file?

PacoRM24 commented 1 year ago

meta.reformatted.txt

ID Length Pancreatic_ControlR1 Pancreatic_ControlR2 Pancreatic_THZ1R1 Pancreatic_THZ1R2 Pancreatic_TPLR1 Pancreatic_TPLR2

sjroth commented 1 year ago

That looks fine... Hmm. Are you running this in the ARTDeco conda environment?

PacoRM24 commented 1 year ago

"Are you running this in the ARTDeco conda environment?"-Yes.

sjroth commented 1 year ago

This is puzzling. I would first verify that the names in the reformatted meta match those of the reformatted experiment names. Then, inspect the raw expression file. Something strange is going on with DESeq2 here.

PacoRM24 commented 1 year ago

What do you mean with "reformatted experiment names"?

sjroth commented 1 year ago

ARTDeco renames the experiment names in the downstream processing files such that they work in R

PacoRM24 commented 1 year ago

"I would first verify that the names in the reformatted meta match those of the reformatted experiment names."-From what I see they look the same.

sjroth commented 1 year ago

Okay. If you attached the raw gene expression file, I can take a look but this is about as far as I can go without direct access to the data.

PacoRM24 commented 1 year ago

gene.exp.raw.txt

Okay. Thank you. Here is the gene.exp.raw.txt file.

sjroth commented 1 year ago

No worries! I will take a look this afternoon.

zhangyf1225 commented 1 year ago

I just have the same problem as PacoRM24 except that I have this error info in the end:

/disk/anaconda3/envs/ARTDeco/lib/python3.6/site-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: Error in read.table(file = file, header = header, sep = sep, quote = quote, : duplicate 'row.names' are not allowed

warnings.warn(x, RRuntimeWarning)

It's puzzling because I used the same code about 2 months ago and it all went quite well.

zhangyf1225 commented 1 year ago

Oh I guess I have found the reason for my error report. I mistakenly filled the two replicates of the Experiment column inside the meta.txt with the same prefix. When I corrected this, it went well.

faleevz commented 3 months ago

Hi there, I unfortunately now have the same problem too when diff_exp_read.

ARTDeco -mode diff_exp_read_in -meta-file /home/mfaleeva/ARTDeco-master/meta.txt [-home-dir /home/mfaleeva/ARTDeco-master_3  -bam-files-dir /home/mfaleeva/ARTDeco-master_3/bam  -layout SE -stranded TRUE  -orientation Reverse -comparisons-file /home/mfaleeva/ARTDeco-master/comparisons.txt]
Running diff_exp_read_in mode...
Loading ARTDeco file structure...
Reformatted meta file exists...
Reformatted comparisons file exists...
ARTDeco will generate the following files:
./diff_exp_read_in/ND-dD-read_in.txt
./diff_exp_read_in/dD-NP-read_in_assignment.txt
./diff_exp_read_in/ND-NP-read_in.txt
./diff_exp/ND-dP_-results.txt
./diff_exp
./diff_exp/ND-dD-results.txt
./diff_exp_read_in/ND-dD-read_in_assignment.txt
./diff_exp/dD-dP-results.txt
./diff_exp_read_in/ND-dP_-read_in_assignment.txt
./diff_exp_read_in/ND-dP_-read_in.txt
./diff_exp_read_in/ND-NP-read_in_assignment.txt
./diff_exp_read_in/dD-dP-read_in.txt
./diff_exp/ND-NP-results.txt
./diff_exp/dD-NP-results.txt
./diff_exp_read_in/dD-dP-read_in_assignment.txt
./diff_exp_read_in
./diff_exp_read_in/dD-NP-read_in.txt
Creating differential expression output directory...
Running DESeq2 on gene expression data...
Creating differential expression with read-in information directory...
Combining differential expression results and read-in information... Inferring read-in genes for upregulated genes with log2 fold change > 2, p-value < 0.05, and FPKM > 0.25... Read-in level threshold is -1...
Using all genes...
(artdeco) [mfaleeva@papr-res-compute05 ARTDeco-master_3]$ 

I get two empty folders (diff_exp_read_in and diff_exp).

Running it in a conda environment, the meta file is tab delimited, and the names of the raw gene exp files and the meta file seem to be the same. If ok, I have attached the raw gene file and meta file if you could potentially take a look to help me decode whats wrong with it? I couldn't find any discussions on previous comments as to why this may be. Thank you so much.

meta.reformatted.txt

gene.exp.raw.txt

sjroth commented 3 months ago

Hi @faleevz,

You are missing an experiment in your meta file.

faleevz commented 3 months ago

Hi Sam, thanks for spotting it. I've re-run the whole thing with the adjusted meta file but it still gives the same problem. Any clue as to what else can be causing this?

sjroth commented 3 months ago

Did you delete meta.reformatted.txt and comparisons.reformatted.txt?

faleevz commented 3 months ago

Yes I did delete those files

sjroth commented 3 months ago

Can you link the newly generated files again? Are there any errors in the stdout?

faleevz commented 3 months ago

meta.reformatted.txt gene.exp.raw.txt

No errors, it always just finishes on Using all genes...

faleevz commented 3 months ago

Looked through other comments and noticed that the modified comparisons txt should have - on it. For some reason, mine does not. Have attached both files (my one and the reformatted one), but they seem the same? Could this be a problem? newcomp.txt comparisons.reformatted.txt

sjroth commented 3 months ago

What command are you running? I noticed in your original command that you forgot to omit the braces. Also, we can schedule a call. This is a high level of involvement for me and a lot of expended time.

faleevz commented 3 months ago

That would be incredibly helpful! I have copied in the code.

ARTDeco -mode preprocess -gtf-file /home/mfaleeva/ARTDeco-master_2/newgenes.gtf -chrom-sizes-file /home/mfaleeva/ARTDeco-master_2/genome.chrom.sizes [-home-dir /home/mfaleeva/ARTDeco-master_2-bam-files-dir /home/mfaleeva/ARTDeco-master_2/bam  -layout SE -stranded TRUE -orientation Reverse -cpu 8 -meta-file /home/mfaleeva/ARTDeco-master_2/meta.txt -comparisons-file /home/mfaleeva/ARTDeco-master_2/newcomp.txt ]

ARTDeco -mode readthrough -gtf-file /home/mfaleeva/ARTDeco-master_2/newgenes.gtf [-home-dir /home/mfaleeva/ARTDeco-master_2 -bam-files-dir /home/mfaleeva/ARTDeco-master_2/bam -layout SE -stranded TRUE -orientation Reverse -cpu 5 ]

ARTDeco -mode get_dogs -gtf-file /home/mfaleeva/ARTDeco-master_2/newgenes.gtf -chrom-sizes-file /home/mfaleeva/ARTDeco-master_2/genome.chrom.sizes [-home-dir /home/mfaleeva/ARTDeco-master_2 -bam-files-dir /home/mfaleeva/ARTDeco-master_2/bam  -layout SE -stranded TRUE -orientation Reverse -cpu 2 ]

ARTDeco -mode diff_exp_read_in -meta-file /home/mfaleeva/ARTDeco-master_2/meta.txt [-home-dir /home/mfaleeva/ARTDeco-master_2  -bam-files-dir /home/mfaleeva/ARTDeco-master_2/bam  -layout SE -stranded TRUE  -orientation Reverse -comparisons-file /home/mfaleeva/ARTDeco-master_2/newcomp.txt ]
sjroth commented 3 months ago

Why are you including braces (the "[" character)? Please omit that.

Before booking a call, delete all of the files and re-run all of the code without the braces?

Also, do you have DESeq2 successfully installed on R?

faleevz commented 3 months ago

I deleted all the files, and re-ran all of the commands without the [ ]

ARTDeco -mode preprocess -gtf-file /home/mfaleeva/ARTDeco-master_2/newgenes.gtf -chrom-sizes-file /home/mfaleeva/ARTDeco-master_2/genome.chrom.sizes -home-dir /home/mfaleeva/ARTDeco-master_2 -bam-files-dir /home/mfaleeva/ARTDeco-master_2/bam  -layout SE -stranded TRUE -orientation Reverse -cpu 8 -meta-file /home/mfaleeva/ARTDeco-master_2/meta.txt -comparisons-file /home/mfaleeva/ARTDeco-master_2/newcomp.txt

ARTDeco -mode readthrough -gtf-file /home/mfaleeva/ARTDeco-master_2/newgenes.gtf -home-dir /home/mfaleeva/ARTDeco-master_2 -bam-files-dir /home/mfaleeva/ARTDeco-master_2/bam -layout SE -stranded TRUE -orientation Reverse -cpu 5

ARTDeco -mode get_dogs -gtf-file /home/mfaleeva/ARTDeco-master_2/newgenes.gtf -chrom-sizes-file /home/mfaleeva/ARTDeco-master_2/genome.chrom.sizes -home-dir /home/mfaleeva/ARTDeco-master_2 -bam-files-dir /home/mfaleeva/ARTDeco-master_2/bam  -layout SE -stranded TRUE -orientation Reverse -cpu 8

ARTDeco -mode diff_exp_read_in -meta-file /home/mfaleeva/ARTDeco-master_2/meta.txt -home-dir /home/mfaleeva/ARTDeco-master_2  -bam-files-dir /home/mfaleeva/ARTDeco-master_2/bam  -layout SE -stranded TRUE  -orientation Reverse -comparisons-file /home/mfaleeva/ARTDeco-master_2/newcomp.txt

I also already have successfully installed and loaded DESeq2 in my R environment

> packageVersion("DESeq2")
[1] ‘1.42.1’

I get the same outcome unfortunately

(artdeco) [mfaleeva@papr-res-compute215 ARTDeco-master_2]$ ARTDeco -mode diff_exp_read_in -meta-file /home/mfaleeva/ARTDeco-master_2/meta.txt -home-dir /home/mfaleeva/ARTDeco-master_2  -bam-files-dir /home/mfaleeva/ARTDeco-master_2/bam  -layout SE -stranded TRUE  -orientation Reverse -comparisons-file /home/mfaleeva/ARTDeco-master_2/newcomp.txt
Running diff_exp_read_in mode...
Loading ARTDeco file structure...
Reformatted meta file exists...
Reformatted comparisons file exists...
ARTDeco will generate the following files:
/home/mfaleeva/ARTDeco-master_2/diff_exp/NP-dP-results.txt
/home/mfaleeva/ARTDeco-master_2/diff_exp_read_in/NP-dP-read_in.txt
/home/mfaleeva/ARTDeco-master_2/diff_exp_read_in/NP-dP-read_in_assignment.txt
/home/mfaleeva/ARTDeco-master_2/diff_exp_read_in/dD-dP-read_in_assignment.txt
/home/mfaleeva/ARTDeco-master_2/diff_exp_read_in/ND-dD-read_in.txt
/home/mfaleeva/ARTDeco-master_2/diff_exp_read_in/dD-dP-read_in.txt
/home/mfaleeva/ARTDeco-master_2/diff_exp/ND-dD-results.txt
/home/mfaleeva/ARTDeco-master_2/diff_exp_read_in/ND-dD-read_in_assignment.txt
/home/mfaleeva/ARTDeco-master_2/diff_exp/dD-dP-results.txt
Running DESeq2 on gene expression data...
Combining differential expression results and read-in information... Inferring read-in genes for upregulated genes with log2 fold change > 2, p-value < 0.05, and FPKM > 0.25... Read-in level threshold is -1...
Using all genes...
(artdeco) [mfaleeva@papr-res-compute215 ARTDeco-master_2]$

Many thanks, I really appreciate your help in this.

sjroth commented 3 months ago

I have no idea what is going on. Let me do some digging on this.

sjroth commented 3 months ago

Hi @faleevz ,

I will be starting my refactor/update of the code base with this issue in mind (as well as others). You have inspired me! Would you be willing to be a tester in the near future?

Best, Sam

faleevz commented 3 months ago

Hi Sam, my apologies for the bother I've been causing! I would love to be a tester, I've got lots of files to analyse coming up :) Its a really cool package you've written, for now I've just conducted the differential expression manually in deseq2. Thanks so much!