Closed WackerO closed 8 months ago
E.g. see this report: SRP254919_zero.html.zip
nf-core lint
overall result: Failed :x:Posted for pipeline commit ce334fc
+| ✅ 244 tests passed |+
#| ❔ 1 tests were ignored |#
-| ❌ 30 tests failed |-
I'm not sure if the example report was to only show how the new p-value=0 threshold on the volcano plot would look like or to show a complete working example. If the former, then it looks great. If the latter, there are a few issues:
After reviewing the code, I couldn't find the reason for these issues, but I'm also not familiar with this codebase.
@kdivilov It was mainly there for showing what the plot looks like, but the example is also working actually. RP23-4H14.1 is not DE according to the tables, why would you expect it to be marked as DE? I just had a look at the DE table, its entry is
gene_id gene_name gene_biotype pvalue padj log2FoldChange differential_status y_data
ENSMUSG00000107358 RP23-4H14.1 TEC 0.05018516 0.9026969 -0.06082605 Not significant 0.0444580489717465
Also, sorry for the confusion, the tables below the plots are rounded. The entries where it says padj 0 aren't actually 0, but a very small number. I've attached the report again, this time just without rounding:
I'm not 100% sold on this. The data are fundamentally outside the axes, and inventing a p value ceiling such that we can show such points as a smear at the extremes of the plot doesn't add much to my mind.
The tables are there to see the gene lists. @WackerO we might want to refine the rounding logic you added to the reporting to use scientific notation rather than marking non-0 p values as 0.
I'm not 100% sold on this. The data are fundamentally outside the axes, and inventing a p value ceiling such that we can show such points as a smear at the extremes of the plot doesn't add much to my mind.
The tables are there to see the gene lists. @WackerO we might want to refine the rounding logic you added to the reporting to use scientific notation rather than marking non-0 p values as 0.
I can definitely change the rounding so that the table is clearer, but I would like to argue in favor of the ceiling. This is something that is for example also done by gprofiler (clicking on random example
, thenrun query
will produce a plot that's capped at 16).
I think the plots are there to visualize the data that are listed in the tables, and if some entries are frequently missing, IMO the visualization is lacking (especially if those missing points are the ones with the smallest p value).
Do you have a specific reason why you don't want these points added to the plot? @pinin4fjords
Because such points can't be represented properly, they are by definition outside the plot boundaries.
You can't just define an arbitrary ceiling- another point could come along with a lower p value than the ceiling. So you'd probably have to set a ceiling equal to the smallest non-zero p value in the data (or, say half the lowest p value). Then you end up distorting the plot for the points you can represent.
I'd much rather just show a warning that there are points with p values of 0 that can't be shown- and the user can go find those in the tables.
In response to https://github.com/nf-core/differentialabundance/issues/227
PR checklist
nf-core lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).