jeffhussmann / knock-knock

toolkit for analyzing genome editing experiments
GNU General Public License v3.0
20 stars 10 forks source link

Debug v.0.4.2 with example data #14

Open hukai916 opened 1 year ago

hukai916 commented 1 year ago

Hi Jeff,

With your updated documentation, it is much easier to repeat the example analysis using knock-knock. I am summarizing the bugs, issues, and concerns that I found when I explore knock-knock with the example data (Pacbio) and the latest documentation (commit: 0c347a33e24754e1d564348cf82b8ff1308bd92a). Hope it can save you some time when you update knock-knock.

  1. installation: setup.py requires "hits v0.4.1", which is not available from PyPI. https://github.com/jeffhussmann/knock-knock/blob/0c347a33e24754e1d564348cf82b8ff1308bd92a/setup.py#L53

  2. Install example dataset:

    • Command used:
      knock-knock install_example_data PROJECT_DIR
    • Issue1: command line in documentation is not up to date, the following one is correct.
      knock-knock install-example-data PROJECT_DIR
    • Issue2: need to install the following missing libraries first:
      conda install -c anaconda seaborn
      conda install -c anaconda scipy
      conda install -c anaconda statsmodels
  3. Build targets:

    • Command used:
      knock-knock build_targets PROJECT_DIR
    • Issue1: command line in documentation is not up to date, the following one is correct.
      knock-knock build-targets PROJECT_DIR
  4. The parallel command

    • Command used:
      knock-knock parallel PROJECT_DIR 4 --group pacbio
    • Error: AttributeError: 'PacbioExperiment' object has no attribute 'generate_alignments'. Did you mean: 'get_read_alignments'?
    • So, I decide to test knock-knock process command first. Encountered same issue, therefore, decide to test each --stage separately.
  5. The process --stage preprocess command: works well

  6. The process --stage align command:

  7. The process --stage categorize command:

    • Command used:
      knock-knock process PROJECT_DIR pacbio R_PCR --stage categorize
    • Error: 'PacbioExperiment' object has no attribute 'uncommon_read_type'
    • Solution: add self.uncommon_read_type = 'CCS' into "pacbio_experiment.py"
  8. The process --stage generate_figure command:

  9. The knock-knock table command:

    • Issue1: documentation command should be knock-knock table PROJECT_DIR not knock-knock table BASE_DIR
    • Concern1: while I am testing using the Pacbio dataset only, this command assumes that both Illumina and Pacbio results are in place. I need to remove thedata/illumina folder for this command to not complain.
    • Error1: missing "exp.batch"
    • Solution: change "exp.batch" to "exp.batch_name" in the "experiment.py" script
    • Error2: AttributeError: 'PacbioExperiment' object has no attribute 'experiment_group':
    • Solution: fix by commenting out the corresponding lines from "table.py", not sure if this is a legit solution or not: https://github.com/jeffhussmann/knock-knock/blob/0c347a33e24754e1d564348cf82b8ff1308bd92a/knock_knock/table.py#L672-L673

After the above steps, I can successfully generate results using example dataset, which are saved here: https://www.dropbox.com/sh/21n95nh0quvom4i/AACjjtxzC3iXeXoFP-lIXPEra?dl=0, can you take a look there and let me know if the results look correct to you or not?

  1. Another concern, I noticed that there are some hard coded genomes in "layout.py", see below for one example, does it mean other custom genomes are not supported? https://github.com/jeffhussmann/knock-knock/blob/0c347a33e24754e1d564348cf82b8ff1308bd92a/knock_knock/layout.py#L426-L431C15

Thank you for looking into this, if you are willing to review, I am happy to create a Pull request with all the changes. Let me know,

--Kai