jeffhussmann / knock-knock

toolkit for analyzing genome editing experiments
GNU General Public License v3.0
20 stars 9 forks source link

Debug v.0.4.2 with example data #14

Open hukai916 opened 8 months ago

hukai916 commented 8 months ago

Hi Jeff,

With your updated documentation, it is much easier to repeat the example analysis using knock-knock. I am summarizing the bugs, issues, and concerns that I found when I explore knock-knock with the example data (Pacbio) and the latest documentation (commit: 0c347a33e24754e1d564348cf82b8ff1308bd92a). Hope it can save you some time when you update knock-knock.

  1. installation: requires "hits v0.4.1", which is not available from PyPI.

  2. Install example dataset:

    • Command used:
      knock-knock install_example_data PROJECT_DIR
    • Issue1: command line in documentation is not up to date, the following one is correct.
      knock-knock install-example-data PROJECT_DIR
    • Issue2: need to install the following missing libraries first:
      conda install -c anaconda seaborn
      conda install -c anaconda scipy
      conda install -c anaconda statsmodels
  3. Build targets:

    • Command used:
      knock-knock build_targets PROJECT_DIR
    • Issue1: command line in documentation is not up to date, the following one is correct.
      knock-knock build-targets PROJECT_DIR
  4. The parallel command

    • Command used:
      knock-knock parallel PROJECT_DIR 4 --group pacbio
    • Error: AttributeError: 'PacbioExperiment' object has no attribute 'generate_alignments'. Did you mean: 'get_read_alignments'?
    • So, I decide to test knock-knock process command first. Encountered same issue, therefore, decide to test each --stage separately.
  5. The process --stage preprocess command: works well

  6. The process --stage align command:

  7. The process --stage categorize command:

    • Command used:
      knock-knock process PROJECT_DIR pacbio R_PCR --stage categorize
    • Error: 'PacbioExperiment' object has no attribute 'uncommon_read_type'
    • Solution: add self.uncommon_read_type = 'CCS' into ""
  8. The process --stage generate_figure command:

  9. The knock-knock table command:

    • Issue1: documentation command should be knock-knock table PROJECT_DIR not knock-knock table BASE_DIR
    • Concern1: while I am testing using the Pacbio dataset only, this command assumes that both Illumina and Pacbio results are in place. I need to remove thedata/illumina folder for this command to not complain.
    • Error1: missing "exp.batch"
    • Solution: change "exp.batch" to "exp.batch_name" in the "" script
    • Error2: AttributeError: 'PacbioExperiment' object has no attribute 'experiment_group':
    • Solution: fix by commenting out the corresponding lines from "", not sure if this is a legit solution or not:

After the above steps, I can successfully generate results using example dataset, which are saved here:, can you take a look there and let me know if the results look correct to you or not?

  1. Another concern, I noticed that there are some hard coded genomes in "", see below for one example, does it mean other custom genomes are not supported?

Thank you for looking into this, if you are willing to review, I am happy to create a Pull request with all the changes. Let me know,
