Open gqoew opened 3 months ago
Looking forward to it. Hopefully, we can add the ability to process files in custom pipelines as soon as possible. This will greatly enhance the scalability of the project. Is there anything I can do? I'd like to help.
Hey, Is there a way to get the files the user has selected in the pipelines class ? currently the only arguments are "user_message, model_id, messages and body". In the default RAG pipeline information such as file names, collection_names are provided, basically information about which file/collection the user has selected in the message. Can this information also be accessed in Pipelines ?
I have the same problem. If we don't have access to user-uploaded files, it limits a lot of functionality 😶It' s hard to get other parameters passed by the front end, such as whether a new session has been created (which bothers me, even if a new session is created, there is no way to restart a new context), likes or dislike, etc.
you can access uploaded files by adding an inlet function, if you upload a file, you should see it in the body:
async def inlet(self, body: dict, user: dict) -> dict:
# This function is called before the OpenAI API request is made. You can modify the form data before it is sent to the OpenAI API.
print(f"inlet:{__name__}")
print(body)
print(user)
return body
@InquestGeronimo Sorry for pinging you, but did the API change? Some weeks ago I tried to make a example pipeline, and it errored out as soon as I attached an image (#66). Is it now "supported"?
That's the main thing that's holding me back from integrating pipelines instead of OpenAI so far. I don't want to loose image capabilities.
EDIT: Looks like something HAS changed! The pipeline doesn't error out anymore. Yay! Guess I'll be using Pipelines now!
@tjbck Care to close this issue? I'm not OP but I guess this is solved.
Here is a hacky way to access uploaded files.
Define an inlet
function as suggested by @InquestGeronimo and query
async def inlet(self, body: dict, user: dict) -> dict:
print(f"Received body: {body}")
files = body.get("files", [])
for file in files:
content_url = file["url"] + "/content"
print(f"file available at {content_url}")
# read the file content as binary and do something ...
return body
Hi @InquestGeronimo , you solution works for me. Thank ! I still have an issue, it seems the data is not given has it is, do you know why ? Is there a way to get the original file content ?
Here is the original data:
PlayerID;FirstName;LastName;Team;Position;Goals;Assists;Appearances
1;Leo;Messi;Paris Saint-Germain;Forward;672;305;786
2;Cristiano;Ronaldo;Al Nassr;Forward;700;223;900
3;Neymar;Da Silva Santos;Al Hilal;Forward;398;200;600
4;Kylian;Mbappe;Paris Saint-Germain;Forward;300;150;400
5;Robert;Lewandowski;FC Barcelona;Forward;500;150;700
6;Kevin;De Bruyne;Manchester City;Midfielder;100;200;500
7;Luka;Modric;Real Madrid;Midfielder;120;170;600
8;N'Golo;Kante;Chelsea;Midfielder;30;80;400
9;Ruben;Dias;Manchester City;Defender;10;20;250
10;Virgil;Van Dijk;Liverpool;Defender;20;15;250
And here is what I got from the pipeline:
{
"id": "6547e61d-dc1d-4544-a4fa-b796d40303e5",
"user_id": "80ce7079-c367-41e5-89f7-7de8534b90e4",
"hash": "75e40889b84327411325d75964484104733eb18c58ff14ff1d0c8f057defa1e0",
"filename": "6547e61d-dc1d-4544-a4fa-b796d40303e5_players.csv",
"data": {
"content": "PlayerID: 1\nFirstName: Leo\nLastName: Messi\nTeam: Paris Saint-Germain\nPosition: Forward\nGoals: 672\nAssists: 305\nAppearances: 786 PlayerID: 2\nFirstName: Cristiano\nLastName: Ronaldo\nTeam: Al Nassr\nPosition: Forward\nGoals: 700\nAssists: 223\nAppearances: 900 PlayerID: 3\nFirstName: Neymar\nLastName: Da Silva Santos\nTeam: Al Hilal\nPosition: Forward\nGoals: 398\nAssists: 200\nAppearances: 600 PlayerID: 4\nFirstName: Kylian\nLastName: Mbappe\nTeam: Paris Saint-Germain\nPosition: Forward\nGoals: 300\nAssists: 150\nAppearances: 400 PlayerID: 5\nFirstName: Robert\nLastName: Lewandowski\nTeam: FC Barcelona\nPosition: Forward\nGoals: 500\nAssists: 150\nAppearances: 700 PlayerID: 6\nFirstName: Kevin\nLastName: De Bruyne\nTeam: Manchester City\nPosition: Midfielder\nGoals: 100\nAssists: 200\nAppearances: 500 PlayerID: 7\nFirstName: Luka\nLastName: Modric\nTeam: Real Madrid\nPosition: Midfielder\nGoals: 120\nAssists: 170\nAppearances: 600 PlayerID: 8\nFirstName: N'Golo\nLastName: Kante\nTeam: Chelsea\nPosition: Midfielder\nGoals: 30\nAssists: 80\nAppearances: 400 PlayerID: 9\nFirstName: Ruben\nLastName: Dias\nTeam: Manchester City\nPosition: Defender\nGoals: 10\nAssists: 20\nAppearances: 250 PlayerID: 10\nFirstName: Virgil\nLastName: Van Dijk\nTeam: Liverpool\nPosition: Defender\nGoals: 20\nAssists: 15\nAppearances: 250"
},
"meta": {
"name": "players.csv",
"content_type": "text/csv",
"size": 579,
"path": "/app/backend/data/uploads/6547e61d-dc1d-4544-a4fa-b796d40303e5_players.csv",
"collection_name": "file-6547e61d-dc1d-4544-a4fa-b796d40303e5"
},
"created_at": 1729251218,
"updated_at": 1729251218
}
Regards
i have "solved" the issue with this approach. This works when files are uploaded inside the chat
class Pipeline:
class Valves(BaseModel):
myValves...
def __init__(self):
self.name = "pipeline_custom_name"
self.valves = self._initialize_valves()
self.file_contents = {}
def _initialize_valves(self) -> Valves:
"""Initialize valves using environment variables."""
return self.Valves(
my valves init
)
async def on_startup(self):
"""Called when the server is started."""
logger.info(f"Server {self.name} is starting.")
async def on_shutdown(self):
"""Called when the server is stopped."""
logger.info(f"Server {self.name} is shutting down.")
async def on_valves_updated(self):
"""Called when the valves are updated."""
logger.info("Valves updated.")
async def inlet(self, body: dict, user: dict) -> dict:
"""Modifies form data before the OpenAI API request."""
logger.info("Processing inlet request")
# Extract file info for all files in the body
# here i have created an inmemory dictionary to link users to their owned files
file_info = self._extract_file_info(body)
self.file_contents[user["id"]] = file_info
return body
def _extract_file_info(self, body: dict) -> list:
"""Extracts the file info from the request body for all files."""
files = []
for file_data in body.get("files", []):
file = file_data["file"]
file_id = file["id"]
filename = file["filename"]
file_content = file["data"]["content"]
# Create a OIFile object and append it to the list
files.append(OIFile(file_id, filename, file_content))
return files
def pipe(
self, body: dict, user_message: str, model_id: str, messages: List[dict]
) -> Union[str, Generator, Iterator]:
logger.info("Starting PIPE process")
# Extract parameters from body with default fallbacks
stream = body.get("stream", True)
max_tokens = body.get("max_tokens", self.valves.LLM_MAX_TOKENS)
temperature = body.get("temperature", self.valves.LLM_TEMPERATURE)
# Extract user ID from the body
user = body.get("user", {})
user_id = user.get("id", "")
# Extract user files if available
if user_id in self.file_contents:
user_files = self.file_contents[user_id]
else:
user_files = None
DO YOUR STUFF
return result
async def outlet(self, body: dict, user: Optional[dict] = None) -> dict:
print(f"outlet:{__name__}")
print(f"Received body: {body}")
if user["id"] in self.file_contents:
del self.file_contents[user["id"]]
return body
Openwebui call the inlet, the pipe and the outlet every time the user send a query to the pipeline. If you create a custom model (from the UI) using as base_model your pipeline, openWEBUI only call the pipe method (I don't understand why).
Hi there,
Being able to access uploaded files would be a great addition to pipelines. It would greatly expand the potential of pipelines, by not being limited with text input.
It would be also great to enable pipelines to return files in the chat as well.
Is there any plan to move this feature forward in the near future? Would be happy to test
Related issues: #66 #19 #81