pathwaycom / llm-app

Dynamic RAG for enterprise. Ready to run with Docker,⚑in sync with Sharepoint, Google Drive, S3, Kafka, PostgreSQL, real-time data APIs, and more.
https://pathway.com/developers/templates/
MIT License
3.36k stars 192 forks source link

Potential security issue #75

Closed psmoros closed 1 week ago

psmoros commented 4 months ago

Hello πŸ‘‹

I run a security community that finds and fixes vulnerabilities in OSS. A researcher (@m0kr4n3) has found a potential issue, which I would be eager to share with you.

Could you add a SECURITY.md file with an e-mail address for me to send further details to? GitHub recommends a security policy to ensure issues are responsibly disclosed, and it would help direct researchers in the future.

Looking forward to hearing from you πŸ‘

(cc @huntr-helper)

dxtrous commented 4 months ago

Hey @psmoros and @m0kr4n3, thanks, please reach out to the maintainers at the same e-mail address as for privacy breaches: https://pathway.com/privacy_gdpr_di/#breach-of-privacy. Happy to be in touch.

m0kr4n3 commented 4 months ago

Hey @dxtrous, thank you for your reply. Actually, I already did and didn't get yet a response from them.

dxtrous commented 4 months ago

Hey @m0kr4n3, thanks a lot for this.

This repo illustrates correct usage of Pathway data processing technology with sample backend data pipelines for AI. They are meant to be simple but include some design best practices. The issue you point out touches on which of the Streamlit-based UI code templates bundled-into this repo in order to demonstrate how to use the pipeline might be suitable for "external" use, and which ones should only be reserved for testing/learning purposes only.

We acknowledge the concern and will make a point to clarify this in documentation, to avoid disillusionment for novice users.

In general, Streamlit was created as a rapid prototyping tool for internal use by data teams, although it is possible in some cases to share some demos on the cloud, hence confusion is starting to arrive. (And yes, we actually do that ourselves, for the sake of demo'ing one of the pipelines from this repo: https://chat-realtime-sharepoint-gdrive.streamlit.app/.)

Thank you.

m0kr4n3 commented 4 months ago

Thank you for the clarification, I appreciate the insight into its intended usage and the acknowledgment of the concern raised.

While I understand that the Streamlit UI is primarily intended for demonstration purposes within trusted environments, I firmly believe that ensuring security even in examples is paramount. As you mentioned, some demos are shared on the cloud, which could potentially expose vulnerabilities. I am more than willing to contribute a patch via a pull request if you would like.

As a reputable company like Pathway, maintaining a proactive approach to security, even in example code, reflects positively on the overall reliability and trustworthiness of your products and services.

Additionally, I want to clarify that the issue resides in the logic behind handling uploaded files within the Streamlit app, rather than a flaw specific to Streamlit itself.

Once again, thank you for your time and understanding. I look forward to collaborating to ensure the continued security and integrity of Pathway's products.

dxtrous commented 4 months ago

Additionally, I want to clarify that the issue resides in the logic behind handling uploaded files within the Streamlit app, rather than a flaw specific to Streamlit itself.

No doubts there. The UI bundled with the unstructured example was designed for single-user use demo only. Setting any avoidable issues aside, the entire flow of data in this specific UI does not make much sense for "non-localhost" use, nor for use by more than one user per server - it essentially acts as a local Python application with web browser controls on it. We hoped this was evident from the logic, however, again, we fully acknowledge that this could be considered misleading for some users, especially given most other examples in this repo have a UI setup with a much cleaner flow of data. We'll probably scrap this UI example at some point soon or move it out to the pathway-labs github organization. I'm keeping the issue open until this is done.

m0kr4n3 commented 4 months ago

I appreciate your clarification and I'll focus my efforts on exploring vulnerabilities within the core application moving forward.

Regarding the pathway hoodies, that's awesome! Thank you so much, I'll gladly take you up on that offer. And by the way, me and my colleague collaborated on the report. Two pathway hoodies would be perfect! Could you please let me know where I can securely share my postal address and size with you?

dxtrous commented 4 months ago

Thanks a lot @m0kr4n3, I've reached out by e-mail.

dxtrous commented 1 month ago

@szymondudycz Following recent repo structure review, would you consider the Streamlit UI separation is now sufficient to consider this issue closed?

szymondudycz commented 1 week ago

The UI for unstructured was changed, so this issue is no longer relevant.