tattle-made / feluda

A configurable engine for analysing multi-lingual and multi-modal content.
https://tattle.co.in/products/feluda/
GNU General Public License v3.0
14 stars 15 forks source link

Finalize Public API #409

Open dennyabrain opened 1 month ago

dennyabrain commented 1 month ago

Overview

We have to finalize the public touchpoints for feluda. The two touchpoints are

  1. the python code written while using feluda
  2. syntax of the config.yml files

Ideally the python code to use feluda should be minimal and intuitive even for non developers. Something like

import Feluda

feluda = Feluda("config-server.yml")
feluda.init()
feluda.start()

The config.yml file is where you declare the operators, store etc that you require. We need to ensure that the syntax of the various configuration objects is consistent and there's error handling and helpful error messages in place for malformed config files.

The other thing to standardize would be how operators are used within the user's python code. Currently we have the following ways :

  1. from core.operators import media_file_hash and then using media_file_hash.run(media)
  2. feluda.operators["media_hash"].run(media)

We can also consider using metaprogramming to enable feluda.operators.media_hash.run(media). The meta programming approach might also provide suggestions and auto completion while developing, making the DX much smoother.

These are just a few on my mind, I am sure there's more things to standardize and finalize. The scope of this task is to list out all inconsistencies, prioritize and take a call on which need to be fixed and fix those.

I also think that working on https://github.com/tattle-made/feluda/issues/410 will help surface inconsistencies faster. So we could start by working on some recipes and then coming back to this issue.

dennyabrain commented 1 week ago

I was trying to use the resnet operator in its most barebones form to see which parts are essential and which can/should be moved to separate packages.

The code was organized like this. I created a feluda_user directory as sibling to the feluda directory. feluda_user

script.py contains the following code to find the closest match of an image compared to a bunch of other images.

# NOT TO BE USED EVENTUALLY
import sys
sys.path.append('../feluda/src')

# IMPORTS
from sklearn.metrics.pairwise import cosine_similarity
from core.feluda import Feluda
from  core.models.media_factory import ImageFactory

feluda = Feluda("config.yml")
feluda.setup()
operator = feluda.operators.get()["image_vec_rep_resnet"]

embeddings = []

for i in range(6):
    file = ImageFactory.make_from_file_on_disk("image-"+str(i)+".png")
    embedding = operator.run(file)
    embeddings.append(embedding)

cos_sim = cosine_similarity(embeddings)
print(cos_sim)

sim_sorted_doc_idx = cos_sim.argsort()
# print(sim_sorted_doc_idx.shape)
print(sim_sorted_doc_idx[0][len(embeddings)-1])
match_ix = sim_sorted_doc_idx[0][len(embeddings)-2]

print("closest matches are image-0.png and image-"+str(match_ix)+".png")

To run this I had to run the following in the terminal.

python -m venv .
source bin/activate
pip install -r requirements.txt

pip install PyYAML==6.0.2
pip install dacite==1.8.1
pip install pydub==0.25.1
pip install boto3==1.35.16
pip install wget==3.2
pip install werkzeug==3.0.3

pip install -r ../feluda/src/core/operators/image_vec_rep_resnet_requirements.txt

Ideally it should have been just the following lines :

python -m venv .
source bin/activate
pip install -r requirements.txt
pip install -r ../feluda/src/core/operators/image_vec_rep_resnet_requirements.txt

Which eventually should become

python -m venv .
source bin/activate
pip install feluda
pip install feluda-op-vid-vec-resnet

the config.yml was

operators:
  label: "Operators"
  parameters:
    - name: "image vectors"
      type: "image_vec_rep_resnet"
      parameters: { index_name: "text" }

The images used for this script were these images.zip

dennyabrain commented 2 days ago

@aatmanvaidya @plon-Susk7 can you see what we have to do to clean up the import statements for feluda.

Current API

from core.feluda import Feluda
from  core.models.media_factory import ImageFactory

Preferred API

from core import Feluda
# or 
import Feluda 

from feluda.models.media_factory import ImageFactory
aatmanvaidya commented 2 days ago

@aatmanvaidya @plon-Susk7 can you see what we have to do to clean up the import statements for feluda.

Current API

from core.feluda import Feluda
from  core.models.media_factory import ImageFactory

Preferred API

from core import Feluda
# or 
import Feluda 

from core.models.media_factory import ImageFactory

yes I was thinking to take this up as the next step, in the current PR, we have a basic packaging into a wheel file working and some project structure setup, next we do this!