metagenlab / MeSS

Snakemake pipeline for simulating shotgun metagenomic samples
https://metagenlab.github.io/MeSS
MIT License
18 stars 2 forks source link

TypeError: cannot convert the series to <class 'int'> #26

Open inspirewind opened 1 month ago

inspirewind commented 1 month ago

MeSS version mess 0.9.0 pyhdfd78af_0 bioconda Describe the bug [Sun Sep 29 14:55:46 2024] Finished job 7. 11 of 19 steps (58%) done InputFunctionException in rule art_illumina in file /home/inspirewind/miniconda3/envs/mess/lib/python3.12/site-packages/mess/workflow/rules/simulate/short_reads.smk, line 38: Error: TypeError: cannot convert the series to <class 'int'> Wildcards: sample=sample fasta=GCF_016127215.1 contig=NZ_CP065991.1 Traceback: File "/home/inspirewind/miniconda3/envs/mess/lib/python3.12/site-packages/mess/workflow/rules/simulate/short_reads.smk", line 49, in File "/home/inspirewind/miniconda3/envs/mess/lib/python3.12/site-packages/pandas/core/series.py", line 248, in wrapper (rule art_illumina, line 38, /home/inspirewind/miniconda3/envs/mess/lib/python3.12/site-packages/mess/workflow/rules/simulate/short_reads.smk)

Minimal example

Additional context

inspirewind commented 1 month ago

Based on the get_value function and the content of the CSV file, it appears that val is returning a Series because the same samplename and contig correspond to multiple seed and cov_sim values in the CSV.

To ensure a scalar value is returned, we can add logic to handle this situation in the get_value function. For example, if val is a Series, choose to return the first value or handle it in another way. Modify get_value as follows:

def get_value(table, wildcards, value):
    if table not in table_cache:
        df = pd.read_csv(
            table,
            sep="\t",
            index_col=["samplename", "fasta", "contig"],
        )
        table_cache[table] = df
    df = table_cache[table]
    val = df.loc[wildcards.sample].loc[wildcards.fasta].loc[wildcards.contig][value]

    print(f"[debug]: table: {table}")
    print(f"[debug]: df: {df}")
    print(f"[debug]: val: {val}")

    if isinstance(val, pd.Series):
        return val.iloc[0] 
    return val
farchaab commented 1 month ago

Hello @inspirewind, and thank you for using MeSS !

Can you provide an example input file under the minimal example section, so we can reproduce the bug and fix it ?

On a side note, I encountered a similar bug and fixed with this commit. However, I did not make a release, so the fix is not live on bioconda yet !

Could you try to install mess from source (see installation) and let me know if it fixed your error ?