rvalieris / LCS

9 stars 4 forks source link

Segmentation fault #17

Open LauraVP1994 opened 1 year ago

LauraVP1994 commented 1 year ago

Dear,

I have two questions regarding your tool: I used the following command: snakemake --config markers=ucsc dataset=mypool --cores 20 --resources mem_gb=500

  1. As you can see in the log, I get a segmentation fault and I'm not sure how to solve this? 2023-05-11T150724.360433.snakemake.log

  2. I see in the log that it is looking for variants such as B.1.1, B.1.1.7,... However, these are samples from March 2023 so they should only contain Omicron variants. I was thus wondering how to do get the results for Omicron variant because as you can see in the config file it should have been updated? I have also tried to follow the instructions for the pangolin method, but it's not clear how to configure this... config.py.txt

Thank you

rvalieris commented 1 year ago

Hello,

  1. looks like matUtils is crashing here, this might be because of some issue with the input files or your system, are you using linux or mac ? and what is the version of usher installed ? conda list usher inside the env

  2. the file data/variant_groups.tsv contains the list of variant groups that will be used for markers, you can remove the variants you are not interested from this file or add new ones, then re-reun the marker generation.

I recommend resolving (2) first, deleting the outputs/ucsc-vcf directory and trying to run the marker generation again.

if that doesn't work check the input files and try to run this command manually:

  1. make sure the sample-list files inside the directory outputs/ucsc-vcf were created correctly.
  2. try to run: matUtils extract -i data/ucsc-sars-cov-2/public-2023-05-10.all.masked.pb.gz -s outputs/ucsc-vcf/Omicron_BA.5.sample-list -a 3 -v outputs/ucsc-vcf/Omicron_BA.5.vcf.gz
LauraVP1994 commented 1 year ago

Hello,

  1. To answer the first question, I use linux and :

    conda list usher
    # packages in environment at /home/lavanpoelvoorde/miniconda3/envs/lcs:
    #
    # Name                    Version                   Build  Channel
    usher                     0.6.2                h99b1ad8_0    bioconda
  2. So there is no outputs/ucsc-vcf that you provide with all or the most recent variants? Is this tool then able to distinguish BA.2.75.1 from BA.2.75.2 or BQ.1 from BQ.1.1 for example?

rvalieris commented 1 year ago
  1. looks ok, this could be an issue with the input then.

  2. I don't have pre-generated markers for these variants, but you can generate markers for any variants you want by editing the data/variant_groups.tsv file and adding the variants you desire, example:

variant_group pango_lineage
BA.2.75.1 BA.2.75.1
BA.2.75.2 BA.2.75.2
BQ.1 BQ.1
BQ.1.1 BQ.1.1

how well it will be able to distinguish very similar variants is another question, I have not tested this but I think it highly depends on how well covered your pooled samples are.