Closed gerritv closed 7 years ago
Server itself targeted to use index files called inpx
and don't provide a way to scan filesystem iteself. I'm not tested in other than fb2-ready inpx files scenarios so there might be (and will, i'm sure 😄) bugs.
But basically you need to:
.inpx
file in some way for every root directory (i.e. if you have c:\lib1
and c:\lib2
you'll need two files)inpx
format (cover, annotation) you'll have to implement IBookParser
and register it to BookParsersPool
.inpx
with related root (i.e. dotopds import c:\lib1 lib1.inpx
)The .inpx
format description i found only in russian, so here is translation
Thank you, that helps me a lot. I have been reading the code and understand more than when I opened the Issue :-) I can generate the .inpx from my PDF parser, will test that out and then decide what to do next. I am impressed with the design, it looks very expandable.
I have the pdf scanner added (Utils/PdfParser.cs), I chose to recursively scan the directory and process each pdf rather than creating an intermediate file. I didn't add another parser to Parsers, the generic one there is sufficient as the Class in Utils does all the work, using InpxParser.cs as a template.
Pondering how to add it to the commands. Would it be better to create another Class in Tasks called PdfScanTask and then a 'pdfscan' command to run it? Much or most of the code in PdfScanCommand.cs would be the same as ImportCommand.cs. I had thought of generalizing ImportTask to make it take an option indicating what to import but that got more complex.
Ok, upon further pondering over an espresso I modified Import Task and ImportCommand:
You can see my code changes so far in https://github.com/gerritv/DotOPDS. Scanning of pdf's is working, but can't get query working via Aldiko. I tried forcing all books/pdf's to have Genre other,other but wtill no joy. so, my next question is: where can I learn about using Owin and System.Web.Http to create some different web pages for serving pages?
Hey Gerrit,
genre should be it's id, not human readable string. You should pick one from list.Add("sf_history");
like instruction in Genres.cs
.
And your Book
model will look like this:
var args = new Book
{
Authors = new[] { author },
Genres = new[] { "other" },
Title = info.Title,
File = Path.GetFileNameWithoutExtension(fi.FullName),
Size = (int)fi.Length,
Ext = "pdf",
Date = info.CreationDate,
Language = "en",
Keywords = info.Keywords.Split(','),
Archive = "",
};
I've also pushed some fixes to master
, you should pull it.
And there is one problem i can't figure it out yet: LuceneImporter
always uses RussianAnalyzer
for now, as there is neither language autodetection, nor good way to populate it on import.
Thank you for those fixes/changes. I now have things sort of working using FBReader. Aldiko and OPDSViewer don't like whatever is being returned. I also need to work on File pathname as my files can be in sub directory off Library Path. Your solution above strips out the intermediate directories. My initial method was also wrong as it resulted in Library Path existing twice in the download link.
I will close this Issue as I am now well past the original question. I would though appreciate a link or book or something where I can learn about WebApi2/Owin/Nowin in English (or Dutch)
I learned WebApi 2 from official docs. Nowin/OWIN is pretty straightforward through Nowin samples and OWIN spec.
Your solution above strips out the intermediate directories.
Yeah, I don't remember all the .net apis but you get the point 😉
Thx, The Message LifeCycle diagram is a huge help.
Yes, I got it :-) My setup is a bit unusual. Now trying to figure out how to make some Pull requests without feeding you my pdf solution. (It relies on DebenuPDFLite, which is a bit of a pain to install but is free). Looking at
git cherry-pick
I would like to add to your implementation an Importer for pdf files. It would get meta data from the PDF file itself. I envisage using multiple source directories (I don't want to move the files from where they are located.) and recurse into them as deep as necessary.
Do I need to only implement something like LuceneImporter or do I need to change code elsewhere to allow choosing an Importer? E.g. in ImportTask.cs the importer is hardcoded to Lucene.