nanoporetech / remora

Methylation/modified base calling separated from basecalling.
https://nanoporetech.com
Other
156 stars 20 forks source link

Error while installing remora #22

Closed AzlanNI closed 2 years ago

AzlanNI commented 2 years ago

Hello Everyone,

I am currently trying to get remora and the Basecaller Bonito on our HPC. I am using the pip install command but i always get the Error :

      ############################
      # Package would be ignored #
      ############################
      Python recognizes 'remora.trained_models' as an importable package, however it is
      included in the distribution as "data".
      This behavior is likely to change in future versions of setuptools (and
      therefore is considered deprecated).

      Please make sure that 'remora.trained_models' is included as a package by using
      setuptools' `packages` configuration field or the proper discovery methods
      (for example by using `find_namespace_packages(...)`/`find_namespace:`
      instead of `find_packages(...)`/`find:`).

      You can read more about "package discovery" and "data files" on setuptools
      documentation page.

  !!

    check.warn(importable)
  error: command 'icc' failed: No such file or directory
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for ont-remora Failed to build ont-remora ERROR: Could not build wheels for ont-remora, which is required to install pyproject.toml-based projects

Maybe this is a known issue or someone can help me out. I am using a PyPi mirror currently since the HPC has no net connection.

I would appreciate any help!

kind regards,

Azlan

marcus1487 commented 2 years ago

I've not encountered this error on installation. Is this the entire error message? If not would you mind sending the complete error message? Could you also share your OS and python versions and the full command submitted?

To venture a guess though, this does appear to be a system compiler issue. icc is the Intel compiler which python is identifying at the C compiler configured to use on this system. Remora requires portions of the code to be compiled and thus requires a valid compiler. You may have some luck setting the compiler at Remora install time (e.g. CC=/path/to/gcc pip install ont-remora).

AzlanNI commented 2 years ago

Hallo @marcus1487

The OS on our HPC Cluster is Linux and i am using Python 3.8.3.

I've used a PyPi Mirror since we don't have access to the internet from the HPC to download pip by using following command:

PIP_CONFIG_FILE=/software/python/pip.conf pip install --user ont-bonito

So one solution could be to load the new compiler like intel/xe2020.4 and try the pip install command again ?

Thanks 4 ur help!

cjw85 commented 2 years ago

Hi @AzlanNI

Is there a particular reason you want to use bonito for basecalling? You may wish to look at the production Guppy basecaller which implements a near identical algorithm to that used in bonito (a slightly earlier version of the remora algorithm).

AzlanNI commented 2 years ago

I am using these basecaller for the detection of modified DNA Bases in a CpG context of cfDNA. I just saw a presentation from ONT in which they showed that the remora models are better in detecting modified bases since they don't sacrifice basecalling accuracy for canonical bases. This is the main reason i wanted to use megalodon or bonito basecaller to use the remora models.

cjw85 commented 2 years ago

Can you provide details of that presentation, it sounds like it needs updating. It is no longer the case with Guppy that asking it to perform modified base calling of CpG will lead to lower canonical base accuracy: Guppy has used the Remora algorithm since v6.1.1, https://community.nanoporetech.com/downloads/guppy/release_notes.

AzlanNI commented 2 years ago

I watched the London Calling 2022: Update from Oxford Nanopore Technologies in which i understood that using the remora models increases the accuracy of the modified basecallings. But maybe i understood it wrong that Remora would be the best option for modified basecalling if both of them are equal in strength and accuracy then Guppy would be a better choice since we are using Guppy 5.0.7 currently on the HPC. But the Version is kinda outdated maybe we should update to the newest Version.

AzlanNI commented 2 years ago

Did i mix up stuff with Guppy and the remora models ? since my Bonito basecaller still is not working sadly on the HPC Cluster.

AzlanNI commented 2 years ago

I also tried the megalodon basecaller but there i always get the Error: RROR: Guppy version string does not match expected pattern: "b'Intel MKL FATAL ERROR: Cannot load /software/guppy/5.0.7/cpu/bin/guppy_basecall_server.\n'"

I think this could also be cause i am utyring to use the newest version of megalodon 2.5 abd Guppy version 5.0.7 .

marcus1487 commented 2 years ago

The Remora algorithms are now the backend for all modified base calling across the different basecaller implementations (megalodon/bonito/guppy). Megalodon and Bonito directly use the implementations from Remora python package, but these may be less stable as these are research demonstrators. The implementation in Guppy is the recommendation, but newer features may lag behind the research basecallers. The next version of Guppy will add support for version 1 Remora models (higher accuracy with a signal re-scaling stage).

Note that Guppy > 6.1 is required for running Remora models within Guppy.

AzlanNI commented 2 years ago

Alright. I got it! If Guppy is the recommendation for modified basecalling then maybe we should just update the Guppy Version on the HPC. As already said the premiss of using bonito and megalodon was to use remora models. Since we taught that Guppy ist sacrificing canonical basepair accuracy. But can u currently use Remora models in Guppy version > 6.1 ?

Thanks a lot for the information and help!

cjw85 commented 2 years ago

But can u currently use Remora models in Guppy version > 6.1 ?

Correct, if you update Guppy and then run with the the configuration dna_r9.4.1_450bps_modbases_5mc_cg_sup_prom.cfg (or similar) and use the --bam_out and --align_ref, guppy will output BAM data of aligned reads annotated tags for modified bases defined in the SAM specification.

AzlanNI commented 2 years ago

alright i will try using the remora models on Guppy ASAP. Is there a command to see the remora Models which are accessible by Guppy 6.1.7 ?

AzlanNI commented 2 years ago

We now have Guppy 6.1.7 installed on the HPC and i wanted to test some remora model usage to detect modified basecalling. Is there a listing of custom tags for the models oder a list in which i could see which model would be the best matching. By using Guppy_basecaller --print_workflow i dont see any modbases models

cjw85 commented 2 years ago

I found the reference to dna_r9.4.1_450bps_modbases_5mc_cg_sup_prom.cfg simply by digging around in the data directory of the guppy installation. I'm not sure how one is supposed to do this but here is a listing of all the configuration files:

dna_r10.4_e8.1_modbases_5hmc_5mc_cg_fast.cfg
dna_r10.4_e8.1_modbases_5hmc_5mc_cg_fast_prom.cfg
dna_r10.4_e8.1_modbases_5hmc_5mc_cg_hac.cfg
dna_r10.4_e8.1_modbases_5hmc_5mc_cg_hac_prom.cfg
dna_r10.4_e8.1_modbases_5hmc_5mc_cg_sup.cfg
dna_r10.4_e8.1_modbases_5mc_cg_fast.cfg
dna_r10.4_e8.1_modbases_5mc_cg_fast_prom.cfg
dna_r10.4_e8.1_modbases_5mc_cg_hac.cfg
dna_r10.4_e8.1_modbases_5mc_cg_hac_prom.cfg
dna_r10.4_e8.1_modbases_5mc_cg_sup.cfg
dna_r9.4.1_450bps_modbases_5hmc_5mc_cg_fast.cfg
dna_r9.4.1_450bps_modbases_5hmc_5mc_cg_fast_prom.cfg
dna_r9.4.1_450bps_modbases_5hmc_5mc_cg_hac.cfg
dna_r9.4.1_450bps_modbases_5hmc_5mc_cg_hac_prom.cfg
dna_r9.4.1_450bps_modbases_5hmc_5mc_cg_sup.cfg
dna_r9.4.1_450bps_modbases_5hmc_5mc_cg_sup_prom.cfg
dna_r9.4.1_450bps_modbases_5mc_cg_fast.cfg
dna_r9.4.1_450bps_modbases_5mc_cg_fast_prom.cfg
dna_r9.4.1_450bps_modbases_5mc_cg_hac.cfg
dna_r9.4.1_450bps_modbases_5mc_cg_hac_prom.cfg
dna_r9.4.1_450bps_modbases_5mc_cg_sup.cfg
dna_r9.4.1_450bps_modbases_5mc_cg_sup_prom.cfg
dna_r9.4.1_e8.1_modbases_5mc_cg_fast.cfg
dna_r9.4.1_e8.1_modbases_5mc_cg_fast_prom.cfg
dna_r9.4.1_e8.1_modbases_5mc_cg_hac.cfg
dna_r9.4.1_e8.1_modbases_5mc_cg_hac_prom.cfg
dna_r9.4.1_e8.1_modbases_5mc_cg_sup.cfg

The only ones likely of interest to you are the dna_r9.4.1... ones. The others are not widely released chemistries.

AzlanNI commented 2 years ago

Great Thanks! I just tried to find something to list them up. Can u tell me if there is a documentation which shows what the custom tags mean e.g. hac mean High accuracy. So what means prom or sup ? Thanks for ur help!

cjw85 commented 2 years ago

fast: fast basecaller hac: high accuracy basecaller sup: super accuracy basecaller prom: promethion (lack of) prom: MinION/GridION

The Guppy user guide can be found in the Nanopore community: https://community.nanoporetech.com/docs/prepare/library_prep_protocols/Guppy-protocol/v/gpb_2003_v1_revae_14dec2018