Open KamilMaliszArdigen opened 2 weeks ago
nf-core pipelines lint
overall result: Passed :white_check_mark: :warning:Posted for pipeline commit b2a122b
+| ✅ 300 tests passed |+
#| ❔ 6 tests were ignored |#
!| ❗ 4 tests had warnings |!
Hi @pinin4fjords, This PR is updating limma module with it's latest version. I will be more than happy to provide any additional information if needed.
@pinin4fjords I would really appreciate your input in this topic. Thank you in advance.
I am getting this error when running a simple comparison. It seems that in FILTER_DIFFTABLE it expects a log2FoldChange column which does not exist.
This is the command:
nextflow run output/differentialabundance/main.nf \
--input ${ diffabundance_samplesheet } \
--matrix ${ diffabundance_counts } \
--contrasts ${ diffabundance_contrasts } \
--outdir output \
--observations_id_col sample \
--features_id_col geneID \
-profile rnaseq \
--differential_use_limma
Workflow execution completed unsuccessfully
The exit status of the task that caused the workflow execution to fail was: 1
Error executing process > 'NFCORE_DIFFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:FILTER_DIFFTABLE (1)'
Caused by:
Process `NFCORE_DIFFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:FILTER_DIFFTABLE (1)` terminated with an error exit status (1)
Command executed:
#!/usr/bin/env python
from math import log2
from os import path
import pandas as pd
import platform
from sys import exit
# 1. Check that the current logFC/padj is not NA
# 2. Check that the current logFC is >= threshold (abs does not work, so use a workaround)
# 3. Check that the current padj is <= threshold
# If this is true, the row is written to the new file, otherwise not
if not any("VISIT_V05_vs_V02.limma.results.tsv".endswith(ext) for ext in [".csv", ".tsv", ".txt"]):
exit("Please provide a .csv, .tsv or .txt file!")
table = pd.read_csv("VISIT_V05_vs_V02.limma.results.tsv", sep=("," if "VISIT_V05_vs_V02.limma.results.tsv".endswith(".csv") else " "), header=0)
logFC_threshold = log2(float("2.0"))
table = table[~table["log2FoldChange"].isna() &
~table["padj"].isna() &
(pd.to_numeric(table["log2FoldChange"], errors='coerce').abs() >= float(logFC_threshold)) &
(pd.to_numeric(table["padj"], errors='coerce') <= float("0.05"))]
table.to_csv(path.splitext(path.basename("VISIT_V05_vs_V02.limma.results.tsv"))[0]+"_filtered.tsv", sep=" ", index=False)
with open('versions.yml', 'a') as version_file:
version_file.write('"NFCORE_DIFFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:FILTER_DIFFTABLE":' + "\n")
version_file.write(" pandas: " + str(pd.__version__) + "\n")
Command exit status:
1
Command output:
(empty)
Command error:
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 3803, in get_loc
return self._engine.get_loc(casted_key)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 5745, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 5753, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'log2FoldChange'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File ".command.sh", line 18, in <module>
table = table[~table["log2FoldChange"].isna() &
~~~~~^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/core/frame.py", line 3805, in __getitem__
indexer = self.columns.get_loc(key)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 3805, in get_loc
raise KeyError(key) from err
KeyError: 'log2FoldChange'
@DSchreyer There are differences in column names in outputs of deseq2 and limma so we created dedicated profile rnaseq_limma where this is adjusted for next steps. You can see in details what options are set here: https://github.com/nf-core/differentialabundance/blob/81cc521ed629231905d4fc762f4dc5c8d7561de0/conf/rnaseq_limma.config#L26
Please try to launch pipeline with following command:
nextflow run output/differentialabundance/main.nf \
--input ${ diffabundance_samplesheet } \
--matrix ${ diffabundance_counts } \
--contrasts ${ diffabundance_contrasts } \
--outdir output \
--observations_id_col sample \
--features_id_col geneID \
-profile rnaseq_limma
@DSchreyer There are differences in column names in outputs of deseq2 and limma so we created dedicated profile rnaseq_limma where this is adjusted for next steps. You can see in details what options are set here:
Please try to launch pipeline with following command:
nextflow run output/differentialabundance/main.nf \ --input ${ diffabundance_samplesheet } \ --matrix ${ diffabundance_counts } \ --contrasts ${ diffabundance_contrasts } \ --outdir output \ --observations_id_col sample \ --features_id_col geneID \ -profile rnaseq_limma
@KamilMaliszArdigen Thanks, yes this helped to solve some of the issues. However, I am still getting the following error in the PLOT_EXPLORATORY process with my own dataset.
In cond_log2_transform_matrix(matrix_data = assay_data[[index]], :
NaNs produced
Fontconfig error: No writable cache directories
Error in density.default(cond_log2_transform_matrix(plotmatrices[[pm]][, :
'x' contains missing values
Calls: ggplot_densityplot ... lapply -> FUN -> density -> density -> density.default
Execution halted
could this come from the issue that we can have negative values in the normalised count matrix ? I also tested it with the test_rnaseq_limma
profile which works as expected. However, i checked the normalised count tables from the test data and there are no negative values. Could this be an issue ?
@DSchreyer So main difference is that in rnaseq profile we have exploratory_log2_assays = 'raw,normalised'
and in rnaseq_limma this is left empty. I'm not sure what is happening in your data please take a look at the work directory and the LIMMA count table - I expect that there might be negative values present. If this is the case it will be needed to keep exploratory_log2_assays empty at least this is my understanding of the plotting logic.
@nf-core-bot fix linting
@DSchreyer So main difference is that in rnaseq profile we have
exploratory_log2_assays = 'raw,normalised'
and in rnaseq_limma this is left empty. I'm not sure what is happening in your data please take a look at the work directory and the LIMMA count table - I expect that there might be negative values present. If this is the case it will be needed to keep exploratory_log2_assays empty at least this is my understanding of the plotting logic.
Hi @KamilMaliszArdigen yes, i checked and there are negative values. When using --limma_use_voom false
it skips the plotting which resolves the issue. Do you think that is a good idea or could interfere with the previous steps ?
@DSchreyer So main difference is that in rnaseq profile we have
exploratory_log2_assays = 'raw,normalised'
and in rnaseq_limma this is left empty. I'm not sure what is happening in your data please take a look at the work directory and the LIMMA count table - I expect that there might be negative values present. If this is the case it will be needed to keep exploratory_log2_assays empty at least this is my understanding of the plotting logic.Hi @KamilMaliszArdigen yes, i checked and there are negative values. When using
--limma_use_voom false
it skips the plotting which resolves the issue. Do you think that is a good idea or could interfere with the previous steps ?
Well this will not work as expected voom is responsible for normalisation of rna_seq data. So this will result in not normalised data processing. I recommend to set the --exploratory_log2_assays = '' and than simply during plot generation log2 will not be generated.
This is a limma module update to provide new logic related to mixed models
PR checklist
nf-core lint
).nf-test test main.nf.test -profile test,docker
).nextflow run . -profile debug,test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).