opencog / link-grammar

The CMU Link Grammar natural language parser
GNU Lesser General Public License v2.1
389 stars 118 forks source link

Low memory handling - unimplemented #537

Open s-e-r-g opened 7 years ago

s-e-r-g commented 7 years ago

1) There are a lot of unchecked malloc/realloc/strdup...etc. They cause crash on 32bit platforms very often.

2) xalloc/exalloc does abort / exit on low memory situation. But API users expect an error to be returned, not exiting the process.

REPRODUCTION: "ru" dictionary and default options in link-parser.exe, Win32 try to check "Сдержанная улыбка, игравшая постоянно на лице Анны Павловны, хотя и не шла к ее отжившим чертам, выражала, как у избалованных детей, постоянное сознание своего милого недостатка, от которого она не хочет, не может и не находит нужным исправляться." (without quotes)

linas commented 7 years ago

Ah interesting. It certainly is very slow, on that sentence, and then shoots up to 6.6GB on my machine. So, yes, this is an example where the current dictionaries are lacking.

I have no plans on fixing this; I have other worries burning up my time. Are you a programmer? do you know anyone who programs? I'll accept any reasonably clean fix for this -- as long as it doesn't make a big mess or introduce obvious bugs, that would be fine. (well, also follow the indentation style; don't send sloppy, careless code ...)

ampli commented 7 years ago

1) I found the problem of the vast memory/slowness. The linkage of this sentence has 4 null words. It also includes many words with the same LL links. On the pruning pass for null_count=0, the LL+ links can only reach the next word, and much pruning results. However, the pruning that is done for null_count>0 is not effective for LL links because they can link far away.

Setting the length of the LL links to 1 solves the problem. Due to the overhead in that setup (a loop over a large number of disjuncts/connectors) it may be a good idea to set them only for the pruning step of null_count>0, and even then only for long enough sentences (or set connector length in another way, e.g when the connectors are created).

In the same occasion I can maybe add an affix-class for morphology links. Is it reasonable to assume their connector length, if limited, is always 1? Also, it may be a good idea to allow setting connector length also for individual links, not only link prefix. So a flexible (and simple) definition scheme is needed. I can just do something "reasonable", unless you have specific ideas regarding the definition format (or what we have to define at all).

2) Regarding clean recovery from a memory allocation failure, it looks the easiest way to handle that without much cluttering the source code, is mainly by using longjump() (but malloc() in certain functions will still need special handling on failure). Is such a solution fine by you?

ampli commented 7 years ago

BTW, long ago I did tests of setting the connector length of LL links and didn't find an improvement. But I checked that on batch file benchmarks, in which it only introduce some slowness, because only null_count==0 is tried in these runs.

linas commented 7 years ago

a loop over a large number of disjuncts/connectors

Where is that loop? It would be better if that loop was a part of the dictionary setup...

it may be a good idea to allow setting connector length also for individual links

Yes. It would be even better to assign a cost, so that longer links are allowed but have a high cost.

longjmp

Could we put c++ into exalloc, throw an exception, and put main in C++ to catch the exception? I assume exceptions can be thrown through C. Exceptions are nicer than longjmp, but, on the other hand, wrapping our code with C++ just to throw seems awfully complicated too.

ampli commented 7 years ago

Where is that loop? It would be better if that loop was a part of the dictionary setup...

set_connector_length_limits(), invoked from prepare_to_parse(). It have not implemented yet the idea of "connector descriptors" to which disjunct connectors would point (and it is not yet clear it will be faster since although it may much less memory, it has an extra redirection), so we cannot do the actual assignment of the length_limit at the directory setup. However, maybe it can be done more efficiently when connectors are created.

Yes. It would be even better to assign a cost, so that longer links are allowed but have a high cost.

I will add this.

I indeed prefer not to add support for C++ exceptions, since implementing it with longjump() is much simpler in our case.

ampli commented 7 years ago

Yes. It would be even better to assign a cost, so that longer links are allowed but have a high cost.

But then, how can we prune these connectors?

linas commented 7 years ago
ampli commented 7 years ago

I'm confused. I think you are saying that you want to add an LL link-length limit to set_connector_length_limits, right? Because there isn't one there now.

I indeed would like to add that, and in the same occasion to add a general infrastructure for setting connector lengths according to connector names.

I asked

how can we prune these connectors?

I referred to the value then gave to assign to the length_limit of the corresponding connector before the pruning step. Say a certain a connector type has a length_limit of l1 up to cost c, and a higher one l2 for cost>c.

Do you mean that, for example, some connector X may have a length_limit of=1 up to cost=2.7, length_limit=10 for 2.7<cost<4 4, and unlimited for cost>4, and I assign it before the pruning according to the cost_max?

See at the end for a definition implementation suggestion. (If you mean that, then of course it is clear what to do.)

I don't understand "connector descriptors" -- there's no fat in Connector_struct

See issue #198. I continue it here. But if desired, we can continue it there or open a new issue instead of 198. I already implemented most of these idea. It turns out it is possible to shrink the Connector_struct size from 64 to even 8 or 4 bytes if this is really desireable. In my current test implementation it is 32 bit, but this implementation is a work in progress (mostly done). Of course I will not suggest to shrink it if my tests don't prove it has a better performance on very long sentences.

Here is its current form (32 bytes) but I plan to cut some fields:

struct Connector_struct
{
    uint8_t length_limit; /* Can be different than in the descriptor */
    uint8_t nearest_word;
                          /* The nearest word to my left (or right) that
                             this could ever connect to.  Computed by
                             setup_connectors() */
    bool multi;           /* TRUE if this is a multi-connector */
    const condesc_t *desc;
    Connector *next;
    /* Hash table next pointer, used only during pruning. */
    union
    {
        Connector * tableNext;
        const gword_set *originating_gword;
    };
};

All the rest are in struct ConDesc, which doesn't change after the directory reading and has one element per connector type.

Here is an 8-byte version: EDIT: Removed " (when using multi:1 desc can be desc:23 to support 8M connector types)".

struct Connector_struct
{
    uint8_t length_limit; /* Can be different than in the descriptor */
    uint8_t nearest_word;
    bool multi;               /* TRUE if this is a multi-connector */
    const int desc;        /*  Supports up to 4G connector types. */
};

If very much desired, a 4-byte version is possible too: EDIT: Not really, see my post below.

struct Connector_struct
{
    uint8_t length_limit; /* Can be different than in the descriptor */
    uint8_t nearest_word;
    int multi:1;                /* TRUE if this is a multi-connector */
    const int desc[15];   /* Supports up to 32K connector types. */
};

(The memory freeing and tableNext are implemented in another way.)

Returning to the subject of this isse: I need your input regarding the question of length_limit and costs. Is something like that (in the affix file) fine?

% Connectors starting with LL are length_limit=1
LL*  1: LENGTH_LIMIT+; 

% For the connector PLx, the length limit depends on the cost, as follows:
% length limit=1 for cost<=2.7
% length_limit=3 for 2.7<cost<4
% length_limit=5 for cost>4
PLx  1 2.7   3 4   5: LENGTH_LIMIT+; 

% length limit=4 for cost<=2.7
% length_limit=UNLIMITED_LEN for cost>2.7
LLX  4 2.7: LENGTH_LIMIT+;

BTW, can the en dict YS be defined as length_limit=1?

ampli commented 7 years ago

If very much desired, a 4-byte version is possible too:

I actually forgot the gword_set field, that needs about 10 bits. So the 4-byte version of Connector_struc is not possible... but the 8-byte version is still possible.

ampli commented 7 years ago

The gword_set field in Connector_struct is used in the fast-mather to filter out mismatching alternatives. The purpose of introducing this filtering was to eliminate the mostly insane-morphism linkages of the ady/amy languages.

It is possible to eliminate it from Connector_struct (to get it to 4-bytes) and still filter out the insane-morphism linkages as now, if we add disjunct arguments to do_count() and pass them to from_match_list(). I already tested adding such disjunct arguments (for another purpose) and the additional overhead was unnoticed.

EDIT: I add this here for not to generatiing a large number of letters...

In order not only to save memory but also save cache misses, I guess more should be done. I thought of allocating connectors in blocks of 64 bytes (i.e. 8 connectors at once in case of 8-byte connectors).

linas commented 7 years ago

I have no objection to these changes, they seem reasonable.

ampli commented 6 years ago

Is xalloc() / xfree() (and exalloc() / exfree()) (designed for memory usage tracking and limiting) needed any more?

Pros:

Cons:

Options:

  1. Fix the code to use xalloc() / xfree() (and exalloc() / exfree()) as originally designed.
  2. Use them only on connector/disjuncts/big tables.
  3. Don't use them at all, and forget about the max_memory option and resources_memory_exhausted().
  4. Still use max_memory option and resources_memory_exhausted(), but only to limit the connector-pair table.
  5. Do nothing, but use only direct malloc() / free() (or a wrapper to malloc() for the new proposed code that can reset itself if malloc() fails).

Please advise.

linas commented 6 years ago

2 or 3 or 4 are reasonable.

Given that this bug was opened less than a year ago, option 2 or 4 seems like the the "nice" thing to do. I admit 3 is appealing...

linas commented 6 years ago

Yes, the YS and YP and PH connectors in English are all length=1

linas commented 6 years ago

I just remove xalloc from the post-processing code

linas commented 6 years ago

valgrind --leak-check=full link-parser < ../data/en/corpus-basic.batch says

==25945== HEAP SUMMARY:
==25945==     in use at exit: 0 bytes in 0 blocks
==25945==   total heap usage: 54,015,957 allocs, 54,015,957 frees, 2,313,157,426 bytes allocated
==25945== 
==25945== All heap blocks were freed -- no leaks are possible

this was with !batch turned off, !link and !spell turned on. There are about 1000 sentences in there, so there are 54K allocs per sentence, and 2.3MB alloc'ed per sentence. Both numbers seem awfully high to me...

linas commented 6 years ago

Turning off !spell, !link and !constituent gives 43K allocs per sentence, 1.6MB per sentence. Still seems high. I guess this is mostly the connectors!?

ampli commented 6 years ago

Most of the allocs are for Table_connector for the do_count() memoization. In long sentences in the fix-long batch there are > 100M allocations. Generally (even for the fixes batch) a large percentage of the CPU is devoted for malloc/free.

In my rewritten hash tables code (yet to be published) I used allocation from pools, so a real malloc is done only once per 1024 (configurable) memory requests . This causes a noticeable speedup.

ampli commented 6 years ago

@s-e-r-g , The problem of huge memory+CPU for parsing long Russian sentences has been partially solved in PR #673 that has been applied a short time ago. On my computer (64 bit) this sentence is now parsed in about one CPU minute instead of about one CPU hour, with using about 1.2GB memory.

linas commented 6 years ago

Yes, I can confirm that. Thank you Amir! Without your persistence, this would not have gotten cured. (General discussion in #632)