robertmccraith / mimm

MLX Image Models
22 stars 0 forks source link

Is there a way to join forces with `mlx-image`? #1

Open ligaz opened 8 months ago

ligaz commented 8 months ago

Hey @robertmccraith and @riccardomusmeci,

Is there a way to combine those two great libraries - mimm and mlx-image so there is a single go to library for MLX when we talk about vision?

Thanks!

robertmccraith commented 8 months ago

Indeed there's a lot of similarities between the two, it's unfortunate that there's some duplications rather than Riccardo contributing or proposing changes here if he felt there was something lacking.

In this library the focus is on converting PyTorch weights to MLX, meaning you can get weights from any source and use them immediately to initialise your model. The focus in mlx-image seems to be integration with hugging face, I felt that at least for the built in initialisation we should reference the original torchvision/hugging face source rather than republish pre-converted weights which seems to be the mlx-image way.

mlx-image also seems solely interested in classification, whereas here I want to expand to other tasks, which I think is more useful long term.

mlx-image creates PyTorch like versions of data loading functions, whereas here I've opted to use mlx data loading strategies, the only downside so far has been ImageNet classes being ordered differently in the mlx loader (but this is very minor).

I proposed we find/create some organisation to host these kinds of project and we can merge the repos in some way but this didn't seem of interest to mlx-image

riccardomusmeci commented 7 months ago

With mlx-image I am trying to create a community on HF since the Apple ML Research team is going this way for LLMs, so I thought it was the best way. Also, the HF team is helping me give the best way to use the library for users in a "free-lunch" way. The idea of having torch-like classes and tools is to make the switch from PyTorch as easy as possible.

Also, mlx-image was born with the idea of being a porting of timm to Apple MLX, that's why the interest in classification. My idea would be to have different libraries for different CV tasks (e.g. detection, segmentation, multimodal, etc) with a foundation library such timm.

Anyway, happy to discuss a way to join forces 💪 I really appreciate @robertmccraith work, I think it's great.