Closed DaHaiHuha closed 4 years ago
So as far as I know the two pieces of code should do the same thing?
The original will return a boolean mask where the sequence is equal to the id, while the other will return an index mask instead. This alteration should be categorically worse since it does the loop in python instead of numpy. Do you have an example of this causing a bug?
So as far as I know the two pieces of code should do the same thing?
The original will return a boolean mask where the sequence is equal to the id, while the other will return an index mask instead. This alteration should be categorically worse since it does the loop in python instead of numpy. Do you have an example of this causing a bug?
In my experiment, seq == pfam_encoding
returned a single False rather than a mask, that leads to a problem. But maybe this is caused different type of data input?
I believe that seq
should be a numpy array and pfam_encoding
should be a single number. The only way the current code would fail while your proposed fix would succeed is if seq is a list? But tensorflow, as far as I know, should make seq a numpy array when using tf.py_func
.
Can you tell me the types of seq
and pfam_encoding
when you run your code?
I mean your code is fine and I can merge it if there is a bug, but I'd like to double check in case this is a sign of some other issue since as far as I know the two should be identical.
Thanks for your carefulness and pursuing perfection! I met this problem when I use this convert_sequence_vocab
function as an independent function in PyTorch, thanks for your explanation and I learnt this new thing.
The original one would return a seq full of 0, need to generate an index mask before assignment the value.