microsoft / hummingbird

Hummingbird compiles trained ML models into tensor computation for faster inference.
MIT License
3.32k stars 274 forks source link

CountVectorizer implementation #203

Open ksaur opened 3 years ago

ksaur commented 3 years ago

The existing CountVectorizer code has jit things such as in the forward function

doc_ids = torch.jit.annotate(List[Tensor], [])  # noqa: F821

which we need to do a bit of a work around so that it doesn't fail at

  File "/root/hummingbird/hummingbird/ml/_container.py", line 63, in forward
    raise RuntimeError("Inputer tensor {} of not supported type {}".format(input_name, type(inputs[i])))

because it's not a tensor

See this branch

Hemantr05 commented 3 years ago

@ksaur I'm solving the same in issue #293 as discussed in issue #164

ksaur commented 3 years ago

Hi @hemantr05,

For issue #164, there are two parts:

We really appreciate your enthusiasm!! If you finish your current two issues (#293 and #273) you can get started on this third one! :) Let me know if you have questions or would like to change which issue you focus on! Thanks again!

interesaaat commented 3 years ago

Actually you can take a look at count vectorizer code at this old branch.

ksaur commented 3 years ago

@interesaaat - I see that I also had the old CV code already posted in the original post above (See "this branch"). :-D I can delete mine if you made changes in yours? (else the appear to be dups)

interesaaat commented 3 years ago

I made no changes, let me delete mine then since it is not used.

Hemantr05 commented 3 years ago

@ksaur Sure. Will finish the previously assigned issue first and get back to this