Open mharoot opened 6 years ago
After taking out 54,000 folders from the cache directory I get a different error:
<--- Last few GCs --->
215777 ms: Scavenge 1408.3 (1447.1) -> 1408.3 (1447.1) MB, 0.7 / 0 ms (+ 6.2 ms in 1 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep]. 215826 ms: Mark-sweep 1408.3 (1447.1) -> 1374.1 (1413.2) MB, 49.3 / 0 ms (+ 6.7 ms in 2 steps since start of marking, biggest step 6.2 ms) [last resort gc]. 215868 ms: Mark-sweep 1374.1 (1413.2) -> 1374.1 (1413.2) MB, 41.6 / 0 ms [last resort gc].
<--- JS stacktrace --->
==== JS stack trace =========================================
Security context: 0x816f69b4629
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory Aborted
I commented out the map functions and It began to enter the data in the CouchDB database. Why does the program crash using the map function from jQuery upon the first time it is called inside rdfParser.js?
// authors: $('pgterms\\:agent pgterms\\:name').map(collect),
// subjects: $('[rdf\\:resource$="/LCSH"] ~ rdf\\:value').map(collect)
I can see map function clearly works for getting the authors but it also gets all the other extras (not sure if this is a problem)
beginning directory walk, importing data to database books { _id: '1', title: 'The Declaration of Independence of the United States of America', authors: { '0': 'Jefferson, Thomas', options: { withDomLvl1: true, normalizeWhitespace: false, xml: false, decodeEntities: true }, _root: { '0': [Object], options: [Object], length: 1, _root: [Circular] }, length: 1, prevObject: { '0': [Object], options: [Object], _root: [Object], length: 1, prevObject: [Object] } } }
'use strict';
const
fs = require('fs'),
cheerio = require('cheerio');
/**
* Like the request module we used earlier, this module sets its exports to a function. Users of the module will call this function, passing in a path to a file and a callback to invoke with the extracted data.
*/
module.exports = function(filename, callback) {
function extract_array($obj) {
let obj_array = Array();
for (let i = 0; i < $obj.length; i++) {
obj_array.push($obj[i]);
}
return obj_array;
}
// The main module function reads the specified file asynchronously, then loads the data into cheerio.
fs.readFile(filename, function(err, data) {
if (err) {
callback(err);
return;
}
let
// cheerio gives back an object we assign to the $ variable. This object works much like the jQuery global function $--it provides methods for querying and modifying elements.
$ = cheerio.load(data.toString()),
// The collect function is a utility method for extracting an array of text nodes from a set of element nodes.
collect = function(index, elem) {
return $(elem).text();
};
// The bulk of the logic for this module is encapsulated in these four lines.
callback(null, {
// we look for the <pgterms:ebook> tag, read its rdf:about=attribute, and pull out just the numerical portion.
_id: $('pgterms\\:ebook').attr('rdf:about').replace('ebooks/', ''),
// we grab the text content of the <dcterms:title> tag.
title: $('dcterms\\:title').text(),
// we find all the <pgterms:name> elements under a <pgterms:agent>
authors: extract_array( $('pgterms\\:agent pgterms\\:name').map(collect) ),
// Lastly, we use the sibling operator (~) to find the <rdf:value> elements that are sibilings of any element whose rdf:resource = attribute ends in LCSH, and collect their text contents.
//subjects: $('[rdf\\:resource$="/LCSH"] ~ rdf\\:value').map(collect)
subjects: extract_array( $('dcterms\\:subject rdf\\:Description rdf\\:value').map(collect) )
});
});
};
I basically added a function to get rid of all the extra junk and it started working for the massive amount of files in cache/epub directory. Also the subjects were not being found.
output:
beginning directory walk /home/michael/Documents/DistributedSystemsNodeJS/Databases/node_modules/json-stringify-safe/stringify.js:5 return JSON.stringify(obj, serializer(replacer, cycleReplacer), spaces) ^
RangeError: Invalid string length
I am not sure why this is not working correctly. I took it directly from your github and got stuck here. I know this code was added 4 years ago. So I'm guessing some of the new updates in NodeJS made this program crash? Is there any new books I can follow. I learned a lot from 'Node.js the Right Way' and up until this point I'm stuck. Thank's in advance.