pg-space / panspace

Embedding-based indexing for compact storage, rapid querying, and curation of bacterial pan-genomes
GNU General Public License v3.0
1 stars 0 forks source link

RuntimeError: dictionary changed size during iteration #1

Closed leoisl closed 7 months ago

leoisl commented 1 year ago

Hello!

I am starting to test this repo, and I am getting the following issue (ran this twice, same issue):

python src/fcgr.py -k 6 --dir-tarfiles data -w 4
Working on salmonella_enterica__01.tar: 100%|███████████████████████████████████████████████████████████████████████████████████████| 4000/4000 [00:04<00:00, 855.95it/s]
number of tarfiles:  33%|██████████████████████████████████████                                                                            | 1/3 [01:15<02:30, 75.06s/it]data/salmonella_enterica__01.tar.xz is done!|██████████████████████████████████████████████████████████████████████████████████████▎| 3966/4000 [00:04<00:00, 720.42it/s]

Working on escherichia_coli__01.tar:   3%|███▍                                                                                                 | 137/4000 [03:32<1:39:44,  1.55s/it]
Working on mycobacterium_tuberculosis__01.tar: 100%|█████████████████████████████████████████████████████████████████████████████████| 4000/4000 [38:10<00:00,  1.75it/s]
number of tarfiles:  67%|██████████████████████████████████████████████████████████████████████████▋                                     | 2/3 [38:49<22:37, 1357.15s/it]data/mycobacterium_tuberculosis__01.tar.xz is done!
Traceback (most recent call last):
  File "src/fcgr.py", line 57, in <module>
    for result in executor.map(fcgr.fcgr_from_tar, tarfiles):
  File "/home/leandro/miniconda3/lib/python3.8/concurrent/futures/_base.py", line 611, in result_iterator
    yield fs.pop().result()
  File "/home/leandro/miniconda3/lib/python3.8/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/home/leandro/miniconda3/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
  File "/home/leandro/miniconda3/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/leandro/git/embedding-bacteria/src/fcgr/fcgr_from_tar.py", line 47, in fcgr_from_tar
    m = self.__call__(seqbio.seq)
  File "/home/leandro/git/embedding-bacteria/venv/lib/python3.8/site-packages/complexcgr/fcgr.py", line 33, in __call__
    for kmer, freq in self.freq_kmer.items():        
RuntimeError: dictionary changed size during iteration
number of tarfiles:  67%|██████████████████████████████████████████████████████████████████████████▋                                     | 2/3 [38:49<19:24, 1164.87s/it]

data dir contains 3 .tar.xz files with 4k genomes each:

ls -lh data/
total 341M
-rw-rw-r-- 1 leandro leandro 174M Jan  6 13:56 escherichia_coli__01.tar.xz
drwxrwxr-x 5 leandro leandro 4.0K Jan  6 14:01 fcgr-6mer
-rw-rw-r-- 1 leandro leandro  89M Jan  6 13:53 mycobacterium_tuberculosis__01.tar.xz
-rw-rw-r-- 1 leandro leandro  78M Jan  6 13:55 salmonella_enterica__01.tar.xz

Wondering if I could get some help on this. Can send you the input if needed

jorgeavilacartes commented 1 year ago

Hello!

this error was fixed in the commit 846fa38, master branch. Please try again and let me know.