oicr-gsi / djerba

Create cancer bioinformatics reports from metadata and workflow output
GNU General Public License v3.0
9 stars 3 forks source link

COLO829 demo #360

Open iainrb opened 7 months ago

iainrb commented 7 months ago

This issue is to track and discuss any install/run issues related to the COLO829 cross-site demo of the wgts.snv_indel plugin.

MareikeJaniak commented 5 months ago

Hi Iain,

My name is Mareike Janiak and I'm a bioinformatics analyst at C3G in Montreal. I'm working with Hector Galvez on testing out djerba for use in our pipeline for the MoH project.

I tried out the wgts.snv_indel plugin with one of the MAF files that we produced with PCGR, but I'm getting an error that I think is related to missing information in the MAF file (tumor depth). In the demo instructions, there is a reference to example config and maf files for COLO829. Would it be possible to share those with us so we can test the plugin? My email is mareike.janiak [at] computationalgenomics.ca.

Thanks! Best, Mareike

iainrb commented 5 months ago

Hi Mareike. Thanks for reaching out -- yes I can provide those files, I'll send them to you tomorrow. Best regards.

MareikeJaniak commented 5 months ago

Hi Iain, Thanks for sharing the files! I was able to reproduce the demo report with the input files you shared.

There were a few errors that I ran into along the way. I found workarounds for them, but they're more bandaids than actual solutions, so I was hoping for your input to find an actual solution.

  1. The first error was with the subprocess.run call for MafAnnotator.py from oncokb-annotator. I'm not sure if it's because I didn't install oncokb-annotator properly (the instructions on their github are somewhat sparse), but even after adding the install location to my $PATH, calling MafAnnotator.py -h would return a syntax error. Running python /path/to/MafAnnotator.py -h works, however. I modified djerba accordingly to allow it to run, but if you have any suggestions for getting the original call to work, let me know!

  2. The second error was with the subprocess.run call for process_snv_data.r. The issue here seemed to be with subprocess. The r script runs fine when called directly (as long as the whizbam url is in quotes) but when run with subprocess.run, I received the following error:

    'Error in readRDS(pfile) : 
    cannot read workspace version 3 written by R 4.0.3; need R 3.5.0 or newer
    Calls: library -> find.package -> lapply -> FUN -> readRDS

    However, when run with subprocess.call(logged_command, shell=True) it worked, as long as the whizbam url was in quotes within the logged_command. I modified djerba accordingly to allow it to run. This issue may have something to do with our R installation(s). The R env returned when run directly and when run via subprocess.run were quite different, for example.

  3. The subprocess.run call to vaf_plot.r returned the same error as above, so I added the same fix. I also had to export DJERBA_BASE_DIR.

Finally, I'm curious how the "ANALYSIS DESCRIPTION" section is populated? We'd eventually like to be able to customize it.

Best, Mareike

iainrb commented 5 months ago

Hi Mareike

I'm glad you could generate the report, that's a great first step. :smiley:

The issues with oncokb-annotator and R are not surprising, calling external programs is always somewhat fragile. We use environment modules and a lot of custom build scripts to set up our production environment, which doesn't transfer easily to other sites. Could you try running djerba.py --debug --log-path my_log_file.log without your modifications, and attach the output?

We are also looking at making Djerba more portable for our own use, eg. to allow development and testing on a laptop, I'll keep you informed.

Yes, DJERBA_BASE_DIR is required. Some plugins also need DJERBA_PRIVATE_DIR and tests require DJERBA_TEST_DIR although I don't think you'll need these for now. I'll add that to the documentation.

I think you mean "ASSAY DESCRIPTION"? This is done by the supplement.body plugin. At the moment it supports 6 different assays used at OICR, but there is definitely scope to customize further. The options for each assay are defined in the plugin HTML template: https://github.com/oicr-gsi/djerba/blob/3aa34358207a86a513b8af680dc0fadf6faa1cce/src/lib/djerba/plugins/supplement/body/supplementary_materials_template.html#L22

We also configure workflow links and version numbers in a separate Python file: https://github.com/oicr-gsi/djerba/blob/main/src/lib/djerba/plugins/supplement/body/versioning.py

Hope that helps! Have a good weekend,

Iain

MareikeJaniak commented 5 months ago

Hi Iain,

I ran the debug command you suggested on the unmodified install of v1.5.6 but I'm getting the following error:

(DJERBA_ENV) [mjaniak@abacus1 mcj_test2]$ djerba.py --debug --log-path mcj_log_file.log
Traceback (most recent call last):
  File "/lb/project/mugqic/projects/MoH_Djerba_development/mcj_test2/DJERBA_ENV/bin/djerba.py", line 71, in <module>
    ap = arg_processor(args)
         ^^^^^^^^^^^^^^^^^^^
  File "/lb/project/mugqic/projects/MoH_Djerba_development/mcj_test2/DJERBA_ENV/lib/python3.11/site-packages/djerba/util/args.py", line 24, in __init__
    self.validate_args(self.args)  # checks subparser and args are valid
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lb/project/mugqic/projects/MoH_Djerba_development/mcj_test2/DJERBA_ENV/lib/python3.11/site-packages/djerba/core/main.py", line 626, in validate_args
    raise ValueError("Unknown subparser: " + args.subparser_name)
                     ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~
TypeError: can only concatenate str (not "NoneType") to str

The only output in the log file is: 2024-05-13_10:31:15 djerba.util.args INFO: Validating paths in command-line arguments

Best, Mareike

iainrb commented 5 months ago

It seems you've uncovered a bug in Djerba error handling, we'll fix that.

But I think the underlying error is a missing subparser name -- ie. you've called Djerba without specifying a mode. You need to do something like djerba.py --debug --log-path my_logfile.log report -i config.ini -o output. If you leave out report you might get an error similar to what you've shown.

Does that help?

MareikeJaniak commented 5 months ago

Hi Iain, I thought it might be something like that! Sorry, I misunderstood your earlier instructions, I will rerun it with the report mode. Will report back!

MareikeJaniak commented 5 months ago

I'm attaching the log file here. The process stops here when it gets to MafAnnotator.py because of the syntax error I was describing. Changing the command to python /path/to/MafAnnotator.py fixes this but it would be great if it worked as intended. I assume my installation of oncokb-annotator isn't quite right or it's a python version conflict?

mcj_logfile.log

iainrb commented 5 months ago

Hi again. Actually this is seems to be an issue with OncoKB annotator, the shebang line in MafAnnotator.py calls #!/usr/bin/python which won't be correct and might not even be Python 3!

In our build script for oncokb-annotator we wrap MafAnnotator.py in a custom shell script as follows:

#!/bin/sh

exec python3 $(dirname $0)/../share/oncokb-annotator/MafAnnotator.py "$@"

Our wrapper script is also called MafAnnotator.py, which is convenient but a little confusing. Sorry, I'd forgotten this was necessary -- you could take a similar approach yourself and test it by running MafAnnotator.py on its own. Let me know how it works. :smiley:

MareikeJaniak commented 5 months ago

Hi Iain, Got it! Yes, that makes sense! I will make a wrapper script and let you know how it goes. Thanks for all of your help!

MareikeJaniak commented 5 months ago

The wrapper script did the trick! The next error is with the subprocess call to the R script. The error seems to be with loading libraries as part of the R script when calling it with subprocess.run. I was able to resolve the error by changing the scripts to use subprocess.call instead and passing the command as a single string, with shell=True. Calling the script directly also runs fine. I suspect that the error has something to do with the environments being different when running R interactively, as opposed to with subprocess.run, so R doesn't manage to look for packages in the right place. Calling Sys.getenv() yields different results, for example. I experimented with setting various paths explicitely (R_LIBS_SITE, R_INCLUDE_DIR, etc) but nothing worked until I tried subprocess.call.

The new log file is here: mcj_logfile2.log

iainrb commented 5 months ago

Hi. I'm not sure what the exact issue is, but this forum post suggests it may be an .Rdata file in your working directory. Easy enough to check. https://forum.posit.co/t/error-in-readrds-file-cannot-read-workspace-version-3-when-using-multiple-r-installations/56632/6

If that doesn't work, I'll continue helping to troubleshoot. :smile:

MareikeJaniak commented 5 months ago

Yes, I found that thread too and was hoping that would be the solution :) I haven't found an .Rdata file, but the system environment returned by Sys.getenv() is very different within subprocess.run(), so I suspect that it's some kind of conflict of environments. I'm trying to load R as one of our installed modules, but it seems that the subprocess.run() call tries to use another installation, perhaps?

I'm working with some of my colleagues to investigate this further!

MareikeJaniak commented 4 months ago

Hi Iain, I just wanted to follow up after our recent meeting. During the meeting you mentioned that you're working on some potential solutions to the issues we're having with the R calls in subprocess. Have you had a chance to look into this at all?

The second task before our next meeting was to look into personalizing the djerba outputs a bit for the different institutions. I've managed to make a draft version of the report with the McGill logo and information in the header, but we would probably also like to be able to change the assay description to match what we do in our pipeline.

Let me know if you'd like to set up a more technical meeting to discuss some of this or if it would be easier for you if I make a fork of djerba to do some of the development on my side?

iainrb commented 4 months ago

Hi Mareike

Sorry for not replying earlier. I'm afraid I haven't had time to look at the R calls issue.

The assay description text is here: https://github.com/oicr-gsi/djerba/blob/main/src/lib/djerba/util/assays.py We can look at making it more configurable if there's a need for it.

A technical meeting to catch up sounds good, I'll check my schedule and maybe we can arrange something next week. In the meantime, I think forking the code for your own development is a good idea, if you haven't already done so.

nodrogluap commented 3 days ago

FYI as it relate to oncokb-annotator's shebang line, I've put a pull request in: https://github.com/oncokb/oncokb-annotator/pull/222

Whether they merge it or not is another question :-)

iainrb commented 2 days ago

[like] Iain Bancarz reacted to your message:


From: Paul Gordon @.> Sent: Tuesday, October 15, 2024 10:24:01 PM To: oicr-gsi/djerba @.> Cc: Iain Bancarz @.>; Author @.> Subject: Re: [oicr-gsi/djerba] COLO829 demo (Issue #360)

FYI as it relate to oncokb-annotator's shebang line, I've put a pull request in: oncokb/oncokb-annotator#222https://github.com/oncokb/oncokb-annotator/pull/222

Whether they merge it or not is another question :-)

— Reply to this email directly, view it on GitHubhttps://github.com/oicr-gsi/djerba/issues/360#issuecomment-2415273893, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAM52VLWVRUWCI7YMYBBJ6LZ3WIYDAVCNFSM6AAAAABDVMYPR6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJVGI3TGOBZGM. You are receiving this because you authored the thread.Message ID: @.***>