sjroth / ARTDeco

MIT License
15 stars 7 forks source link

diff_exp_read_in error #6

Closed pillailab closed 2 years ago

pillailab commented 3 years ago

Hello, I had successfully ran all the pipelines on our cluster ( .yml conda installation as was suggested) but now I am encountering this error for the differenteial read-In ( seems like a DESEq error). Please advise

Running diff_exp_read_in mode... Loading ARTDeco file structure... Reformatted meta file exists... Reformatted comparisons file exists... ARTDeco will generate the following files: ./diff_exp_read_in/WT_Nuclear-Mut_Nuclear-read_in_assignment.txt ./diff_exp_read_in/WT_Nuclear-Mut_Nuclear-read_in.txt ./diff_exp ./diff_exp_read_in ./diff_exp/WT_Nuclear-Mut_Nuclear-results.txt Creating differential expression output directory... Running DESeq2 on gene expression data... /gpfs/ysm/project/pillai/mp758/conda_envs/ARTDeco/lib/python3.6/site-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: estimating size factors

warnings.warn(x, RRuntimeWarning) /gpfs/ysm/project/pillai/mp758/conda_envs/ARTDeco/lib/python3.6/site-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: estimating dispersions

warnings.warn(x, RRuntimeWarning) /gpfs/ysm/project/pillai/mp758/conda_envs/ARTDeco/lib/python3.6/site-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: gene-wise dispersion estimates

warnings.warn(x, RRuntimeWarning) /gpfs/ysm/project/pillai/mp758/conda_envs/ARTDeco/lib/python3.6/site-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: mean-dispersion relationship

warnings.warn(x, RRuntimeWarning) /gpfs/ysm/project/pillai/mp758/conda_envs/ARTDeco/lib/python3.6/site-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: final dispersion estimates

warnings.warn(x, RRuntimeWarning) /gpfs/ysm/project/pillai/mp758/conda_envs/ARTDeco/lib/python3.6/site-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: fitting model and testing

warnings.warn(x, RRuntimeWarning) Output DESeq2 results... /gpfs/ysm/project/pillai/mp758/conda_envs/ARTDeco/lib/python3.6/site-packages/rpy2/robjects/pandas2ri.py:191: FutureWarning: from_items is deprecated. Please use DataFrame.from_dict(dict(items), ...) instead. DataFrame.from_dict(OrderedDict(items)) may be used to preserve the key order. res = PandasDataFrame.from_items(items) Creating differential expression with read-in information directory... Combining differential expression results and read-in information... Inferring read-in genes for upregulated genes with log2 fold change > 2, p-value < 0.05, and FPKM > 0.25... Read-in level threshold is -1... Using all genes...

sjroth commented 3 years ago

These are warnings, not errors. They are not indicative of a problem.

Best, Sam

pillailab commented 3 years ago

Hi Sam, although it says warning, the diff_exp and diff_exp and diff_exp_read_in folders are empty in this case. I think the DESeq2 is encountering some formatting issue. here is the output

Running diff_exp_read_in mode... Loading ARTDeco file structure... Reformatted meta file exists... Reformatted comparisons file exists... ARTDeco will generate the following files: ./diff_exp_read_in/_SF3B1-_Control-read_in_assignment.txt ./diff_exp ./diff_exp/_SF3B1-_Control-results.txt ./diff_exp_read_in/_SF3B1-_Control-read_in.txt ./diff_exp_read_in Creating differential expression output directory... Running DESeq2 on gene expression data... /gpfs/ysm/project/pillai/mp758/conda_envs/ARTDeco/lib/python3.6/site-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: Error in .validate_names(colnames, ans_colnames, "assay colnames()", "colData rownames()") : assay colnames() must be NULL or identical to colData rownames()

warnings.warn(x, RRuntimeWarning) Creating differential expression with read-in information directory... Combining differential expression results and read-in information... Inferring read-in genes for upregulated genes with log2 fold change > 2, p-value < 0.05, and FPKM > 0.25... Read-in level threshold is -1... Using all genes...

sjroth commented 3 years ago

This looks like there is a mismatch between the experiment names in your meta file and your experiments being considered.

paulocaldas commented 3 years ago

Hi Sam.

I'm also having the same problem (missmatch between experiment names) but I cannot find out the reason. My "experiments being considered" are the ones in the bam-files directory, is that correct? so the Experiment column in the metafile should have the same number/name of the files in the bam-files directory. I assume ... Or am I doing something wrong?

sjroth commented 3 years ago

Yes. The names of the files without the ".bam" suffix should be there.

paulocaldas commented 3 years ago

Ok, so I'm not doing anything wrong as far as I can tell. image

do you have any suggestion to track the problem?

sjroth commented 3 years ago

Check your gene expression file. The error is that the columns in that file do not mach the row names in the meta file.

paulocaldas commented 3 years ago

They seem fine. Assuming that ID and Length are dropped during the process.

image

sjroth commented 3 years ago

Can you send the stdout from running ARTDeco?

paulocaldas commented 3 years ago

image

sjroth commented 3 years ago

This is a different error from the one in this thread. Please open up a different thread for this.

AdrianaLecourieux commented 2 years ago

Hello,

I come here because i have exactly the same problem that paulo had before:

Error in .validate_names(colnames, ans_colnames, "assay colnames()", "colData rownames()") : assay colnames() must be NULL or identical to colData rownames()

I checked my gene expression file and my meta file, the names are the same. Do you have a solution for that ?

sjroth commented 2 years ago

Hi @AdrianaLecourieux,

Can you please provide the command line output, meta file, and first row of the gene expression file? I cannot confirm if this is a bug or user error if those are not provided.

AdrianaLecourieux commented 2 years ago

The input :

ARTDeco -mode diff_exp_read_in -meta-file META_FILE meta.reformatted.txt -bam-files-dir /nameofbamdir

The meta_file :

Experiment Group

merged_A-siPNUTS7 siPNUTS7

merged_A-sict sict

merged_B-siPNUTS7 siPNUTS7

merged_B-sict sict

merged_C-siPNUTS7 siPNUTS7

merged_C-sict sict

The raw gene expression :

ID Length merged_A_siPNUTS7 merged_A_sict merged_B_siPNUTS7 merged_B_sict merged_C_siPNUTS7 merged_C_sict

ENST00000372733.3 23136.0 3633.5 1566.0 3265.5 1323.5 3144.0 1339.0

ENST00000610959.4 4145.0 20.5 38.5 30.0 48.0 27.0 40.0

The error message :

Running diff_exp_read_in mode...

Loading ARTDeco file structure...

/home/adriana/anaconda3/envs/ARTDeco/lib/python3.6/site-packages/rpy2/robjects/pandas2ri.py:17: FutureWarning: pandas.core.index is deprecated and will be removed in a future version. The public classes are available in the top-level namespace.

from pandas.core.index import Index as PandasIndex

Reformatted meta file exists...

Reformatted comparisons file exists...

ARTDeco will generate the following files:

./diff_exp_read_in/siPNUTS7-sict-read_in_assignment.txt

./diff_exp_read_in/siPNUTS7-sict-read_in.txt

./diff_exp/siPNUTS7-sict-results.txt

Running DESeq2 on gene expression data...

/home/adriana/anaconda3/envs/ARTDeco/lib/python3.6/site-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: Error in .validate_names(colnames, ans_colnames, "assay colnames()", "colData rownames()") :

assay colnames() must be NULL or identical to colData rownames()

warnings.warn(x, RRuntimeWarning)

Combining differential expression results and read-in information... Inferring read-in genes for upregulated genes with log2 fold change > 2, p-value < 0.05, and FPKM > 0.25... Read-in level threshold is -1...

Using all genes...

sjroth commented 2 years ago

Thank you. Is the meta file tab-delimited?

AdrianaLecourieux commented 2 years ago

Yes !

sjroth commented 2 years ago

Hmmm... That is odd. Can you paste the meta.reformatted.txt file? The error is saying that the column names of the expression data file and the row names of the meta file are not matching up (a common error for DESeq2).

AdrianaLecourieux commented 2 years ago

The meta.reformatted.txt file is the meta file that i send :)

sjroth commented 2 years ago

It shouldn't be. The character "-" is replaced with "_" when the meta file is reformatted by ARTDeco.

AdrianaLecourieux commented 2 years ago

Yes i saw that, and it was the same on the comment of paulocaldas

AdrianaLecourieux commented 2 years ago

I didn't know why, because meta.reformatted.txt file is exacty the samd

sjroth commented 2 years ago

Are you creating the meta.reformatted.txt or did ARTDeco generate it? If you are creating it, you are messing with ARTDeco's internal file dependency structure.

AdrianaLecourieux commented 2 years ago

Oh.. I created it.

sjroth commented 2 years ago

Please delete meta.reformatted.txt and comparisons.reformatted.txt. Then, name those files differently and place them outside the ARTDeco file structure. ARTDeco is designed to maintain its own internally consistent file structure. You have inadvertently messed with this structure.

AdrianaLecourieux commented 2 years ago

If i rename these file in meta.txt and comparisons.txt, is it ok ? But i'll do like you say

sjroth commented 2 years ago

Yes. That will solve it. I would also place it outside the preprocess directory. It is not good practice to alter a pipeline's file structure.

AdrianaLecourieux commented 2 years ago

Ok thank you, I'll try that and I tell you if it's good.

AdrianaLecourieux commented 2 years ago

Hi ! I tried this morning and now i have an other probleme, the command work but my diff_exp files are empty. I have the other same files since yesterday.

input : ARTDeco -mode diff_exp_read_in -meta-file meta.txt -bam-files-dir /mnt/c/Users/adriana.lecourieux/Desktop/Stage/ARTDeco/test/bam

Running diff_exp_read_in mode... Loading ARTDeco file structure... /home/adriana/anaconda3/envs/ARTDeco/lib/python3.6/site-packages/rpy2/robjects/pandas2ri.py:17: FutureWarning: pandas.core.index is deprecated and will be removed in a future version. The public classes are available in the top-level namespace. from pandas.core.index import Index as PandasIndex Reformatted meta file exists... Reformatted comparisons file exists... ARTDeco will generate the following files: ./diff_exp_read_in/siPNUTS7-sict-read_in_assignment.txt ./diff_exp_read_in/siPNUTS7-sict-read_in.txt ./diff_exp/siPNUTS7-sict-results.txt Running DESeq2 on gene expression data... /home/adriana/anaconda3/envs/ARTDeco/lib/python3.6/site-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: estimating size factors

warnings.warn(x, RRuntimeWarning) /home/adriana/anaconda3/envs/ARTDeco/lib/python3.6/site-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: estimating dispersions

warnings.warn(x, RRuntimeWarning) /home/adriana/anaconda3/envs/ARTDeco/lib/python3.6/site-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: gene-wise dispersion estimates

warnings.warn(x, RRuntimeWarning) /home/adriana/anaconda3/envs/ARTDeco/lib/python3.6/site-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: mean-dispersion relationship

warnings.warn(x, RRuntimeWarning) /home/adriana/anaconda3/envs/ARTDeco/lib/python3.6/site-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: final dispersion estimates

warnings.warn(x, RRuntimeWarning) /home/adriana/anaconda3/envs/ARTDeco/lib/python3.6/site-packages/rpy2/rinterface/init.py:146: RRuntimeWarning: fitting model and testing

warnings.warn(x, RRuntimeWarning) Output DESeq2 results... Combining differential expression results and read-in information... Inferring read-in genes for upregulated genes with log2 fold change > 2, p-value < 0.05, and FPKM > 0.25... Read-in level threshold is -1... Using all genes...

sjroth commented 2 years ago

Did you delete the comparisons.reformatted.txt when you deleted your initial meta.reformatted.txt?

AdrianaLecourieux commented 2 years ago

Yes i did

sjroth commented 2 years ago

Okay. Can you give me the entire meta.reformatted.txt, the entire comparisons.reformatted.txt, and the first few lines of the raw gene expression file? Delete all of those files and try re-running ARTDeco.

AdrianaLecourieux commented 2 years ago

I don't have meta.reformatted.txt and comparisons.reformatted.txt. When i ran readthrough mode, ARTDeco didn't create these files. I deleted all and re-run and the result is the same.

sjroth commented 2 years ago

Okay. Re-run the preprocess mode then the diff_exp_read_in mode. Then, copy and paste those files as I requested.

AdrianaLecourieux commented 2 years ago

If i try tu run preprocess without comparisons file, i have the error :

KeyError: './preprocess_files/comparisons.reformmatted.txt'

sjroth commented 2 years ago

Let me rephrase: what are your input files? You are running into a lot of user input errors and this will move faster if you specify what your inputs are. Do you have a meta file and a comparisons file?

AdrianaLecourieux commented 2 years ago

Yes

sjroth commented 2 years ago

And please copy and paste the contents of your comparisons and meta files. Not the reformatted files. The raw files.

AdrianaLecourieux commented 2 years ago

Ok so my files are :

comparisons file :

siPNUTS7 sict

meta file

Experiment Group

merged_A-siPNUTS7 siPNUTS7

merged_A-sict sict

merged_B-siPNUTS7 siPNUTS7

merged_B-sict sict

merged_C-siPNUTS7 siPNUTS7

merged_C-sict sict

sjroth commented 2 years ago

What is your command for preprocess mode?

AdrianaLecourieux commented 2 years ago

ARTDeco -mode preprocess -gtf-file modified_genes.gtf -chrom-sizes-file hg38.chrom.sizes.txt -bam-files-dir /mnt/c/Users/adriana.lecourieux/Desktop/Stage/ARTDeco/test/bam/ -comparisons-file comparison.txt -meta-file meta.txt

I already tried without comparisons_file, meta_file and with all parameters

sjroth commented 2 years ago

What is the command line stderr from that command?

sjroth commented 2 years ago

That is, when you run the command, what does ARTDeco print out on the command line. It is good practice to include this any time you are submitting an issue on Github and aids in debugging.