qiyunzhu / woltka

Woltka: a versatile meta'omic data classifier
BSD 3-Clause "New" or "Revised" License
68 stars 24 forks source link

AssertionError: Conflicting values found for "RXN-13667". #200

Open deppworld opened 6 months ago

deppworld commented 6 months ago

Hi I am getting this error when I am running the following :

woltka classify \ --input indir \ --coords coords.txt \ --map gene-to-protein.map \ --map protein-to-enzrxn.map \ --map enzrxn-to-reaction.map \ --map reaction-to-pathway.map \ --map pathway-to-super.map \ --rank gene,protein,enzrxn,reaction,pathway,super \ --output outdir

Constructing classification system... Will extract rank name from map filename. Parsing simple map file: gene-to-protein.txt... Done. Parsing simple map file: protein-to-enzrxn.txt... Done. Parsing simple map file: enzrxn-to-reaction.txt... Done. Parsing simple map file: reaction-to-pathway.txt...Traceback (most recent call last): File "/home/dverma2/miniforge3/bin/woltka", line 8, in sys.exit(cli()) File "/home/dverma2/miniforge3/lib/python3.10/site-packages/click/core.py", line 1157, in call return self.main(args, kwargs) File "/home/dverma2/miniforge3/lib/python3.10/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/home/dverma2/miniforge3/lib/python3.10/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/dverma2/miniforge3/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/dverma2/miniforge3/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(args, kwargs) File "/home/dverma2/miniforge3/lib/python3.10/site-packages/woltka/cli.py", line 195, in classify_cmd workflow(kwargs) File "/home/dverma2/miniforge3/lib/python3.10/site-packages/woltka/workflow.py", line 122, in workflow tree, rankdic, namedic, root = build_hierarchy( File "/home/dverma2/miniforge3/lib/python3.10/site-packages/woltka/workflow.py", line 801, in build_hierarchy updatedict(rankdic, {k: rank for k in set(map.values())}) File "/home/dverma2/miniforge3/lib/python3.10/site-packages/woltka/util.py", line 75, in update_dict add_dict(dic, key, value) File "/home/dverma2/miniforge3/lib/python3.10/site-packages/woltka/util.py", line 44, in add_dict assert dic[key] == value, f'Conflicting values found for "{key}".' AssertionError: Conflicting values found for "RXN-13667".

Kindly help me to resolve this.

qiyunzhu commented 6 months ago

@deppworld I checked this file: http://ftp.microbio.me/pub/wol-20April2021/function/metacyc/reaction-to-pathway.txt . There seems to be some duplicates in the placement of RXN-13667. You may try removing all lines that have RXN-13667, and see if the program works. If so, this entry might be a special case that we need to deal with.

deppworld commented 6 months ago

@qiyunzhu Hi thanks for your prompt response. I removed this Id and tried but there are so many IDs duplicated in two reference files (reaction-to-pathway.txt and pathway-to-super_pathway.txt ). I am trying to remove them and update them here if it will work.

qiyunzhu commented 6 months ago

@deppworld In your situation, I think classifying to just genes (ORFs) followed by multiple collapses is more appropriate. See here for an example.

deppworld commented 6 months ago

@qiyunzhu Hi, I run the following command:
woltka collapse -i C003.sam -m protein-to-enzrxn.txt -n enzrxn_name.txt -m enzrxn-to-reaction.txt -n reaction_name.txt -m reaction-to-pathway.txt -n pathway_name.txt -m pathway-to-super_pathway.txt -n pathway_name.txt -m pathway_type.txt -n all_class_name.txt -m protein-to-gene.map -n gene_name.txt -m protein-to-go.map -m enzrxn-to-regulation.map -m regulation-to-regulator.map -n compound_name.txt -o ./test

and I am getting the following error:

Traceback (most recent call last): File "/home/dverma2/miniforge3/bin/woltka", line 8, in sys.exit(cli()) File "/home/dverma2/miniforge3/lib/python3.10/site-packages/click/core.py", line 1157, in call return self.main(args, kwargs) File "/home/dverma2/miniforge3/lib/python3.10/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/home/dverma2/miniforge3/lib/python3.10/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/dverma2/miniforge3/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/dverma2/miniforge3/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(args, kwargs) File "/home/dverma2/miniforge3/lib/python3.10/site-packages/woltka/cli.py", line 234, in collapse_cmd collapse_wf(kwargs) File "/home/dverma2/miniforge3/lib/python3.10/site-packages/woltka/tools.py", line 232, in collapsewf table, = read_table(input_fp) File "/home/dverma2/miniforge3/lib/python3.10/site-packages/woltka/table.py", line 156, in read_table table = read_tsv(fh) File "/home/dverma2/miniforge3/lib/python3.10/site-packages/woltka/table.py", line 240, in read_tsv data.append([int(x) if x.isdigit() else float(x) File "/home/dverma2/miniforge3/lib/python3.10/site-packages/woltka/table.py", line 240, in data.append([int(x) if x.isdigit() else float(x) ValueError: could not convert string to float: 'G000154205'

in addition to this, the reference folder is missing some files (protein.map.xz, gene-to-protein.map.xz )

qiyunzhu commented 6 months ago

Which reference folder are you referring to? WoL1 or WoL2?

deppworld commented 6 months ago

Wol2

deppworld commented 6 months ago

Hi I tried with the collapse command but I could not get gene-to-protein.biome because of no reference file in the folder( https://ftp.microbio.me/pub/wol-20April2021/function/metacyc/ ) then I directly tried gene-to-pathway but I am getting following error: (base) dverma2@EB-ONC-J8Q3FF3:/mnt/f/woltka$ woltka collapse -i gene.biom -m gene-to-pathway.txt -o pathway.biom Number of features before collapsing: 435442. Reading mapping file: gene-to-pathway.txt... Done. Collapsing profile...Traceback (most recent call last): File "/home/dverma2/miniforge3/bin/woltka", line 8, in sys.exit(cli()) File "/home/dverma2/miniforge3/lib/python3.10/site-packages/click/core.py", line 1157, in call return self.main(args, kwargs) File "/home/dverma2/miniforge3/lib/python3.10/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/home/dverma2/miniforge3/lib/python3.10/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/dverma2/miniforge3/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/dverma2/miniforge3/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(args, kwargs) File "/home/dverma2/miniforge3/lib/python3.10/site-packages/woltka/cli.py", line 234, in collapse_cmd collapse_wf(kwargs) File "/home/dverma2/miniforge3/lib/python3.10/site-packages/woltka/tools.py", line 270, in collapse_wf round_table(table, digits or None) File "/home/dverma2/miniforge3/lib/python3.10/site-packages/woltka/table.py", line 447, in round_table round_biom(table, digits) File "/home/dverma2/miniforge3/lib/python3.10/site-packages/woltka/biom.py", line 195, in round_biom tmd.data = np.vectorize(f)(tmd.data).astype('float64') File "/home/dverma2/miniforge3/lib/python3.10/site-packages/numpy/lib/function_base.py", line 2372, in call return self._call_as_normal(*args, **kwargs) File "/home/dverma2/miniforge3/lib/python3.10/site-packages/numpy/lib/function_base.py", line 2365, in _call_as_normal return self._vectorize_call(func=func, args=vargs) File "/home/dverma2/miniforge3/lib/python3.10/site-packages/numpy/lib/function_base.py", line 2450, in _vectorize_call ufunc, otypes = self._get_ufunc_and_otypes(func=func, args=args) File "/home/dverma2/miniforge3/lib/python3.10/site-packages/numpy/lib/function_base.py", line 2406, in _get_ufunc_and_otypes raise ValueError('cannot call vectorize on size 0 inputs ' ValueError: cannot call vectorize on size 0 inputs unless otypes is set

I also tried with classify but getting different error:

(base) dverma2@EB-ONC-J8Q3FF3:/mnt/f/woltka$ woltka classify -i gene.biom -m gene-to-pathway.txt -o pathway.biom Traceback (most recent call last): File "/home/dverma2/miniforge3/bin/woltka", line 8, in sys.exit(cli()) File "/home/dverma2/miniforge3/lib/python3.10/site-packages/click/core.py", line 1157, in call return self.main(args, kwargs) File "/home/dverma2/miniforge3/lib/python3.10/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/home/dverma2/miniforge3/lib/python3.10/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/dverma2/miniforge3/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/dverma2/miniforge3/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(args, kwargs) File "/home/dverma2/miniforge3/lib/python3.10/site-packages/woltka/cli.py", line 195, in classify_cmd workflow(kwargs) File "/home/dverma2/miniforge3/lib/python3.10/site-packages/woltka/workflow.py", line 112, in workflow samples, files, demux = parse_samples( File "/home/dverma2/miniforge3/lib/python3.10/site-packages/woltka/workflow.py", line 432, in parsesamples map = id2file_from_map(fp) File "/home/dverma2/miniforge3/lib/python3.10/site-packages/woltka/file.py", line 335, in id2file_from_map for line in fh: File "/home/dverma2/miniforge3/lib/python3.10/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte

I think IDs are not matching due to one-step escaping (Gene to protein). I am hoping you can help me with this.

jolespin commented 3 months ago

@qiyunzhu It looks like the FTP might be down right now.