karel-brinda / MiniPhy

Phylogenetic compression of extremely large genome collections [661k β†˜πŸ­πŸ²π—šπ—Άπ—• | BIGSIdata β†˜πŸ°πŸ΄π—šπ—Άπ—• | AllTheBact'23 β†˜πŸ³πŸ±π—šπ—Άπ—•]
https://brinda.eu/mof
Other
19 stars 0 forks source link

Missing int conversion by argparse in `create_batches.py` #94

Closed shenwei356 closed 8 months ago

shenwei356 commented 8 months ago
$ ./create_batches.py ../file2species.tsv -d input/ -s species -f file -c -m 200 
Loaded 1932811 genomes across 10357 species clusters 
Traceback (most recent call last): 
 File "./create_batches.py", line 191, in <module> 
   main() 
 File "./create_batches.py", line 187, in main 
   batching.run() 
 File "./create_batches.py", line 108, in run 
   self._create_dustbin() 
 File "./create_batches.py", line 64, in _create_dustbin 
   if len(fns) >= self.cluster_min_size: 
TypeError: '>=' not supported between instances of 'int' and 'str

Could be fixed by

-     if len(fns) >= self.cluster_min_size: 
+     if len(fns) >= int(self.cluster_min_size): 

It's strange, cause you've set the variable type as int:

https://github.com/karel-brinda/MiniPhy/blob/main/create_batches.py#L126

karel-brinda commented 8 months ago

It's indeed a bug. It's weird that I haven't encountered it on my computer....

karel-brinda commented 8 months ago

Oh, I know why – I've always used the default values, which had the correct type.