graemeg / xananews

A fantastic Windows NNTP client, written in Delphi.
Other
52 stars 17 forks source link

request for clues on the message storage format #12

Open amlynnworth opened 6 years ago

amlynnworth commented 6 years ago

This is a documentation request. I would be very interested to find out what the message storage format is. Looking at some files on my disk from an old copy of XanaNews, I see *.dat files, and inside that I see reasonably human readable content with some binary separators like 0F 00 after the message number and before the path.

Does anyone have a write-up on these details, for the current XanaNews?

graemeg commented 6 years ago

I'll see if I can put some documentation together for you. It would be good to having such documentation inside the repository anyway.

amlynnworth commented 5 years ago

Hi - definitely still interested. I'm looking at compiling XanaNews with Delphi 10.2.3 and then 10.3 now - January.

Even general hints about what code to study in order to find details about the data format would be very much appreciated.

Thank you.

wilsoncpw commented 5 years ago

Wow - I'm amazed that people are still actively looking at XanaNews!

Here's the format of messages.dat...

The format is:
'X-Msg:' 6 char header
xxxxxxxx 8 char hex string containg message length
nn word length of first extra header
char (nn) nn char first extra header string
nn word length of second extra header string
char (nn) nn char second extra header string
...
nn word 0
Then follows the message - length xxxxxxxx

Colin (The original author)

amlynnworth commented 5 years ago
And I am amazed to get such a succinct answer from you, Colin!  

I maintain www.codenewsfast.com  on a very back burner, volunteer
  basis.  I want to replace the part of my process that downloads
  over NNTP with your XanaNews code for downloading.  Everything
  within CodeNewsFast has been written in Delphi.  And I know
  XanaNews is more reliable than the code I have been trying to
  maintain all these years.  This way, I should be able to download
  to the XanaNews DAT files and process the articles from there,
  reliably and at my convenience, into the Firebird SQL database
  that holds the articles in the format I need for the public. 

So I think this is for a good cause. 

Thank you so much for the info & have a great year. 

Ann ( Lynnworth of HREF Tools Corp. )

On 07/01/2019 18:46, Colin Wilson
  wrote:

  Wow - I'm amazed that people are still actively looking at
    XanaNews!
  Here's the format of messages.dat...
  The format is:
    'X-Msg:' 6 char header
    xxxxxxxx 8 char hex string containg message length
    nn word length of first extra header
    char (nn) nn char first extra header string
    nn word length of second extra header string
    char (nn) nn char second extra header string
    ...
    nn word 0
    Then follows the message - length xxxxxxxx
  Colin (The original author)
  —
    You are receiving this because you authored the thread.
    Reply to this email directly, view it on GitHub, or mute the thread.
  {"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/graemeg/xananews","title":"graemeg/xananews","subtitle":"GitHub repository","main_image_url":"https://github.githubassets.com/images/email/message_cards/header.png","avatar_image_url":"https://github.githubassets.com/images/email/message_cards/avatar.png","action":{"name":"Open in GitHub","url":"https://github.com/graemeg/xananews"}},"updates":{"snippets":[{"icon":"PERSON","message":"@wilsoncpw in #12: Wow - I'm amazed that people are still actively looking at XanaNews!\r\n\r\nHere's the format of messages.dat...\r\n\r\nThe format is:                                                       \r\n   'X-Msg:'    6 char header                                          \r\n   xxxxxxxx   8 char hex string containg message length              \r\n   nn             word length of first extra header                      \r\n   char (nn)  nn char first extra header string                      \r\n   nn             word length of second extra header string              \r\n   char (nn)  nn char second extra header string                     \r\n   ...                                                                \r\n   nn          word 0                                                 \r\n                Then follows the message - length xxxxxxxx             \r\n\r\nColin (The original author)"}],"action":{"name":"View Issue","url":"https://github.com/graemeg/xananews/issues/12#issuecomment-452039114"}}}
  [

{ "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/graemeg/xananews/issues/12#issuecomment-452039114", "url": "https://github.com/graemeg/xananews/issues/12#issuecomment-452039114", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

graemeg commented 5 years ago

Thanks Colin for your reply. Yes, XanaNews is still the best NNTP news client around! I even use it under FreeBSD and Linux via WINE (the Windows API Emulator).

@amlynnworth : I believe the code you are looking for is in the unitNNTPServices.pas unit.

amlynnworth commented 4 years ago

Hello again. Would anyone care to explain the other .dat file structure? What separates articles? Is there a fast way to index in and know which bytes to read for a particular message?

We have been studying unitNNTPServices.pas in the last week but have not fully figured this out yet.

amlynnworth commented 4 years ago

Okay, tab separates fields within an article line and CRLF separates the article-basic-fact lines.

This is the remaining puzzle. I will ask about one example which is easy to see, and tiny:

embarcadero.public.announce\articles.dat

It has only 3 articles from year 2009.

My question is about the trailing integer fields. I see the #Lines, and then 3 more integers. If someone could explain what those are, that would be great.

The content of that articles.dat file follows, with full respect to John Kaster.

5   ANN: Scheduled quick maintenance    John Kaster <>  Fri, 29 May 2009 22:22:29 GMT   <122742@forums.embarcadero.com>     261 10  33554496    0
8   ANN: System Alert: Server maintenance   John Kaster <>  Fri, 12 Jun 2009 22:37:24 GMT   <127103@forums.embarcadero.com>     248 10  33554496    604
9   ANN: Electrical power testing on Saturday, June 27, 2009    John Kaster <>  Sat, 27 Jun 2009 00:40:53 GMT   <132035@forums.embarcadero.com>     367 14  33554496    1195

In article #5, line count is 261. What do the integers at the end of that line, i.e. 10, 33554496, 0, refer to?

Many thanks.

Ann