pepkit / looper

A job submitter for Portable Encapsulated Projects
http://looper.databio.org
BSD 2-Clause "Simplified" License
20 stars 7 forks source link

summarizer visualization issues #220

Closed nsheff closed 4 years ago

nsheff commented 4 years ago

Related to #122

Found a bunch of issues with the summarizer:

nsheff commented 4 years ago

Perhaps the thumbnail problem was this: https://github.com/pepkit/looper/issues/213#event-2882354301

nsheff commented 4 years ago

@michalstolarczyk if you have time to glance at any of these issues before you go, we would appreciate it, as you have the best perspective on these things. -- but if you are out of time that's fine I will try to take a look at this next week!

nsheff commented 4 years ago

@jpsmith5 I think many of these issues are due to using an outdated version of looper. After michal updates and releases the next version of looper, can you update your looper?

jpsmith5 commented 4 years ago

Will do and will then update.

stolarczyk commented 4 years ago

@jpsmith5 could you install and test the summarizer in looper v0.12.6-dev (current dev branch)? In case there are some other issues, we will include fixes to these in this release.

jpsmith5 commented 4 years ago

Couple items jump out:

Side note:

nsheff commented 4 years ago

@jpsmith5, can you provide the path to your summary file on rivanna? Is it: file:///project/shefflab/processed/ppqc/ppqc_summary.html

jpsmith5 commented 4 years ago

file:///project/shefflab/processed/peppro/paper/cutadapt/PEPPRO_summary.html

Also have noticed that if a summary file is of an incomplete (still running) project, some of the columns don't show up in the 'Plot a column' area.

See: file:///project/shefflab/processed/peppro/paper/cutadapt/01_06_2020/PEPPRO_summary.html

nsheff commented 4 years ago

Look at this page:

file:///project/shefflab/processed/peppro/paper/cutadapt/reports/adapter_insertion_distribution.html

The links link to the png instead of the pdf. The links on the sample page link to the pdf.

Shouldn't the link targets be PDFs?

jpsmith5 commented 4 years ago

Good catch: Yes, should be linking to the pdfs.

stolarczyk commented 4 years ago

@jpsmith5, could you also share the PEP so I can test the changes that I'm making on a more sophisticated project?

jpsmith5 commented 4 years ago

@michalstolarczyk Current one is in the ppqc repo

jpsmith5 commented 4 years ago

Another oddity: It looks like a pipeline's summarize function is called twice when using looper summarize.

e.g.

Loaded config file: /sfs/lustre/bahamut/scratch/jps3dp/tools/databio/ppqc/peppro_paper_cutadapt.yaml
/sfs/qumulo/qproject/shefflab/processed/peppro/paper/cutadapt/01_06_2020/results_pipeline/Jurkat_ChRO_1/QC_hg38/Jurkat_ChRO_1_preseq_yield.txt/sfs/qumulo/qproject/shefflab/processed/peppro/paper/cutadapt/01_06_2020/results_pipeline/Jurkat_ChRO_2/QC_hg38/Jurkat_ChRO_2_preseq_yield.txt/sfs/qumulo/qproject/shefflab/processed/peppro/paper/cutadapt/01_06_2020/results_pipeline/Jurkat_ChRO_1/QC_hg38/Jurkat_ChRO_1_preseq_counts.txt/sfs/qumulo/qproject/shefflab/processed/peppro/paper/cutadapt/01_06_2020/results_pipeline/Jurkat_ChRO_2/QC_hg38/Jurkat_ChRO_2_preseq_counts.txt
2 of 69 files available
Processing Jurkat_ChRO_1
Processing Jurkat_ChRO_2
INFO: Found real counts for Jurkat_ChRO_1 - Total (M): 21.334642 Unique (M): 13.137206
INFO: Found real counts for Jurkat_ChRO_2 - Total (M): 33.989659 Unique (M): 30.455427

Loaded config file: /sfs/lustre/bahamut/scratch/jps3dp/tools/databio/ppqc/peppro_paper_cutadapt.yaml
/sfs/qumulo/qproject/shefflab/processed/peppro/paper/cutadapt/01_06_2020/results_pipeline/Jurkat_ChRO_1/QC_hg38/Jurkat_ChRO_1_preseq_yield.txt/sfs/qumulo/qproject/shefflab/processed/peppro/paper/cutadapt/01_06_2020/results_pipeline/Jurkat_ChRO_2/QC_hg38/Jurkat_ChRO_2_preseq_yield.txt/sfs/qumulo/qproject/shefflab/processed/peppro/paper/cutadapt/01_06_2020/results_pipeline/Jurkat_ChRO_1/QC_hg38/Jurkat_ChRO_1_preseq_counts.txt/sfs/qumulo/qproject/shefflab/processed/peppro/paper/cutadapt/01_06_2020/results_pipeline/Jurkat_ChRO_2/QC_hg38/Jurkat_ChRO_2_preseq_counts.txt
2 of 69 files available
Processing Jurkat_ChRO_1
Processing Jurkat_ChRO_2
INFO: Found real counts for Jurkat_ChRO_1 - Total (M): 21.334642 Unique (M): 13.137206
INFO: Found real counts for Jurkat_ChRO_2 - Total (M): 33.989659 Unique (M): 30.455427

Basically it has called the same command twice for some reason.

stolarczyk commented 4 years ago

Do you mean that the custom summarizer is run twice here? I don't see this issue in my tests

jpsmith5 commented 4 years ago

Yes, exactly that. Huhhhh. What would cause it to run twice in my hands but not yours...? Will have to think on that.

stolarczyk commented 4 years ago

fyi, I tested it just with my dummy project with custom summarizers, not the peppro one. But still, it should not behave differently..

stolarczyk commented 4 years ago

ok, I think all the issues and suggestions are addressed (apart from the custom summarizers running twice). Can you test again Jason?

jpsmith5 commented 4 years ago

Okay:

So just the one remaining item being the links to pdf's on object pages.

stolarczyk commented 4 years ago

oh, I thought #222 fixed that. Will look into this. Thanks for testing

jpsmith5 commented 4 years ago

Also still not sure source of the double pipeline summarizer calls.

Manually, a lah, Rscript tools/PEPPRO_summarizer.R ../ppqc/peppro_paper.yaml it goes through once. But, as soon as I looper summarize ppqc/peppro_paper.yaml, I get a double run of it.

At least eliminates the actual R script as being the source it seems.

nsheff commented 4 years ago

oh, I thought #222 fixed that. Will look into this. Thanks for testing

Yeah, it was an attempt, but maybe failed. It was a shot in the dark, I didn't test it -- it would be better if, instead of using array index numbers, we could use named attributes (figure.pdf instead if figure[0]).

stolarczyk commented 4 years ago

2678ade6f0207ae9ca79099f99a1afa998994c2f should fix the pdf linking

jpsmith5 commented 4 years ago

Confirmed!

stolarczyk commented 4 years ago

Also still not sure source of the double pipeline summarizer calls.

Manually, a lah, Rscript tools/PEPPRO_summarizer.R ../ppqc/peppro_paper.yaml it goes through once. But, as soon as I looper summarize ppqc/peppro_paper.yaml, I get a double run of it.

At least eliminates the actual R script as being the source it seems.

maybe now?

nsheff commented 4 years ago

maybe now?

See #217

stolarczyk commented 4 years ago

just realized that the status table sorting is not working as expected. Since the time and memory values are strings they are sorted just in alphabetical order. So for example 16:36:32 is displayed before 2:58:18 after sorting.

I'll turn the sorting back off, then

jpsmith5 commented 4 years ago

Can confirm the Summarizer for the pipeline is only running the one time now.