Closed sacdallago closed 3 years ago
quick fix would be to default to 0
max_len = max(len(i) for i in sequences, default=0)
but may be symptom of bigger issue (happens at the beginning of the pipeline -- why would the pipeline try to embed an empty set of sequences?)
CC @konstin
I had already fixed that once but only embed_many
and not also in embed_batch
:
I guess the empty sets are artifacts of the batching, but we have to support them for the python api anyway so I think they're fine
Alright. I suppose the default
fix should work as does the if not
, no? The former comes with the plus that you don't need to cast the iterable to list
iirc I tried default=0
but it fails because now you iterate twice (once for _assert_max_len and once for embedding), so if you pass an actual iterator instead of a list it's empty on the second iteration and no embeddings will be produced.
Aha! Good catch. I implemented it with tee
Fixed in a5e6bce76dbf0de964b84abed891b3f4dca4be21
Everything works now on my end. A thousand thanks for the super fast fix!
@HannesStark reports:
Basically:
means
max
is run on an empty sequence. Seems strange to me...@HannesStark please add config & fasta here