Closed bruffridge closed 6 months ago
This was somewhat a "design choice". There's no way for the assistant to automatically know the file type, and thus that it's an image. Options:
Downsides:
I could do either one of these, and would prefer the function call, but neither are ideal.
The current solution relies on somehow mentioning it is an image. Based on my own testing, what you said would work almost all of the time because you mentioned "image". Not sure why it didn't in this case...
Understanding that file extensions are independent of how the file was created. When a file is uploaded to BIDARA, the name or path of the file can be accessed. In order to determine the file type based on the file extension, one can use the determine_file_type()
function.
Assuming you have a dictionary mapping the file extensions to types, here is the sample code:
def determine_file_type(file_path):
# Get file extension
_, file_extension = os.path.splitext(file_path)
file_extension = file_extension.lower().strip(".")
# Lookup file extension in the dictionary
return extension_mappings.get(file_extension, "Unknown")
# Function to handle file upload in the BIDARA
def handle_file_upload(uploaded_file_path):
file_type = determine_file_type(uploaded_file_path)
print(f"The uploaded file '{uploaded_file_path}' is of type: {file_type}")
Hoping I didn't miss any detail, and this helps. Let me know.
If the image analysis fails after fixing the file type, we can use the Mask R-CNN model that I previously used for object detection and segmentations in Computer Vision.
I found out about COCO (Common Objects in Context) dataset. It is used as the train set with Mask R-CNN and can classify objects, returning class IDs that are integers identifying each class. The COCO dataset has assigned unique values to its classes. Here's an example of object detection code:
# Load a random image from the images folder
file_names = next(os.walk(IMAGE_DIR)) [2] # the code retrieves the list of filenames from the directory specified by IMAGE_DIR.
image = skimage.io.imread(os.path.join(IMAGE_DIR, random.choice(file_names)))
# Run Detection
results = model.detect ([image], verbose=1) # typically indicates that detailed progress and debug information will be displayed during the detection process
# Visualize results
r = results[0]
visualize.display_instances(image, r['rois'], None, r['class_ids'],
class_names, r['scores']) # replacing r['mask'] with 'None' will not highlight the pixels of objects in the image
Understanding that file extensions are independent of how the file was created. When a file is uploaded to BIDARA, the name or path of the file can be accessed. In order to determine the file type based on the file extension, one can use the
determine_file_type()
function.Assuming you have a dictionary mapping the file extensions to types, here is the sample code:
def determine_file_type(file_path): # Get file extension _, file_extension = os.path.splitext(file_path) file_extension = file_extension.lower().strip(".") # Lookup file extension in the dictionary return extension_mappings.get(file_extension, "Unknown") # Function to handle file upload in the BIDARA def handle_file_upload(uploaded_file_path): file_type = determine_file_type(uploaded_file_path) print(f"The uploaded file '{uploaded_file_path}' is of type: {file_type}")
Hoping I didn't miss any detail, and this helps. Let me know.
Thank you for the reply.
The issue isn't with the client knowing the file type, because we could put in exactly as you said. The issue is with the assistant knowing the file type, as there's no way to pass information to the assistant without it being a user message.
The flow with deep-chat and assistants works something like this:
As you can see, even if we have the file extension and thus the file type, we can't share this information with the assistant directly.
As mentioned previously, the only way to give the file type to the assistant without it being in the user message (as far as I'm aware) is through a function call. This fact would remain even in the case of using other image recognition software, as the assistant itself still has to make that call.
Let me know if you have any ideas with this, or questions. We can also discuss when we meet 😊
This was somewhat a "design choice". There's no way for the assistant to automatically know the file type, and thus that it's an image. Options:
- Tell it that the file is an image
- Function call to determine type of file
Downsides:
- Only way to tell it is with a user message, which must appear in the chat. This would look something like an image with a text below it that says "(user uploaded an image)". This did not seem ideal.
- Function call results in a rather significant additional "load" time to response, and cost, for every file uploaded because it has to make an additional function call.
I could do either one of these, and would prefer the function call, but neither are ideal.
The current solution relies on somehow mentioning it is an image. Based on my own testing, what you said would work almost all of the time because you mentioned "image". Not sure why it didn't in this case...
Pushed a with the mentioned function call.
As I said, there is a time hit involved with this. It does have a better feel that the assistant already knows what it is, but it feels worse to wait that long at the same time. Maybe it'll feel better when streaming is implemented (properly) by deep-chat.
Though you could also argue that in some instances it saves time, like yours, where it doesn't know and you have to tell it again what kind of file it is.
I must be doing something wrong.