merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
440 stars 145 forks source link

[BUG] Error with p-values in functional enrichment #1828

Closed philipwoods closed 2 years ago

philipwoods commented 3 years ago

Short description of the problem

I get errors about invalid p-values when running anvi-compute-functional-enrichment-in-pan with the --include-gc-identity-as-function option.

anvi'o version

Anvi'o .......................................: hope (v7.1)

Profile database .............................: 38 Contigs database .............................: 20 Pan database .................................: 15 Genome data storage ..........................: 7 Auxiliary data storage .......................: 2 Structure database ...........................: 2 Metabolic modules database ...................: 2 tRNA-seq database ............................: 2

System info

OS is Red Hat Enterprise Linux release 8.3 (Ootpa). Anvi'o is installed as a conda environment.

Detailed description of the issue

I ran the command anvi-compute-functional-enrichment-in-pan -g ANME-3-EVO-GENOMES.db -p pangenomics/ANME3EVO-PAN.db --category-variable group --include-gc-identity-as-function -o function-enrichment/enrichment-IDENTITY.tsv -F function-enrichment/occurrence-IDENTITY.tsv --annotation-source IDENTITY and successfully got an occurrence table file but got an error during the enrichment analysis. The log file for the error is copied below:

# DATE: 28 Oct 21 10:49:20
# CMD LINE: anvi-script-enrichment-stats --input /export/data1/tmp/tmpi6r4xx2z --output function-enrichment/enrichment-IDENTITY.tsv
Note: Using an external vector in selections is ambiguous.
ℹ Use `all_of(n_columns_before_data)` instead of `n_columns_before_data` to silence this message.
ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
This message is displayed once per session.
Error: Problem with `mutate()` column `adjusted_q_value`.
ℹ `adjusted_q_value = ...$NULL`.
✖ ERROR: The estimated pi0 <= 0. Check that you have valid p-values or use a different range of lambda.
Backtrace:
     █
  1. ├─`%>%`(...)
  2. ├─dplyr::select(...)
  3. ├─dplyr::left_join(., df_in, by = "accession")
  4. ├─dplyr::mutate(...)
  5. ├─dplyr:::mutate.data.frame(...)
  6. │ └─dplyr:::mutate_cols(.data, ..., caller_env = caller_env())
  7. │   ├─base::withCallingHandlers(...)
  8. │   └─mask$eval_all_mutate(quo)
  9. ├─qvalue::qvalue(...)
 10. │ └─qvalue::pi0est(p, ...)
 11. │   └─base::stop("ERROR: The estimated pi0 <= 0. Check that you have valid p-values or use a different range of lambda.")
 12. └─base::.handleSimpleError(...)
 13.   └─dplyr:::h(simpleError(msg, call))
Execution halted

Files to reproduce

I copied the temp file that gets passed to anvi-script-enrichment-stats and attached it below. I get the same message as above when I run the script directly and use this as input. IDENTITY-input.txt

meren commented 3 years ago

@adw96 does this make any sense to you by any chance?

adw96 commented 3 years ago

Thanks for tagging me, @meren! @philipwoods I can confirm that I can reproduce the issue and hope to return with a fix or update tomorrow. Thank you for providing the temp file!