taktix / Usenetr

4 stars 0 forks source link

Few questions about Usenetr #1

Open alextreppass opened 14 years ago

alextreppass commented 14 years ago

Trying to get a handle on the sorts of things you're doing here.

taktix commented 14 years ago

sorry for the lack of docs, was in a hurry to get my indexer up and going again.

I've updated the wiki with answers to your first question

yes its tightly integrated with django. Django and python are great tools. I built the site in a matter of hours. It of course helped that there were already libraries for usenet and yenc.

for now SQL LIKE works well enough. There aren't enough records in the database nor users for it to matter. I'm familiar with both haystack and pylucene and would consider using them if the need arose.

Right now just NZBs, for technical reasons. Its too much data to store titles of all posts. Maybe if i had more spare diskspace.

You can't avoid crawling an entire newsgroup, even when you store just the titles/ID of a post containing an NZB. Usenet has no search capabilities. Ive built the crawler to be able to incrementally crawl after the initial index is built. I'm going to expand this further.

For the part of usenet that does not have nzb files, i wanted a search similar function but I can't determine a good way to make this work. Filenames seem too random to make pattern matching work well. If it could be made to work you could temporarily store titles, removing them if a matching nzb post is found, and condensing the rest into a list of post IDs. This would take some considerable computing power though.