mbdavid / LiteDB

LiteDB - A .NET NoSQL Document Store in a single data file
http://www.litedb.org
MIT License
8.5k stars 1.24k forks source link

Problems with inserting strings larger than 3500 bytes #83

Closed dsoronda closed 8 years ago

dsoronda commented 8 years ago

I have this classes public class ImageFileInfo { public int Id { get; set; } public string FileName { get; set; } public string Extension { get; set; } public long FileSize { get; set; } public ImageInfo ImageInfo { get; set; } public IList Thumbnails { get; set; } public string RootPath { get; set; } public string Directory { get; set; } public string RelativePath { get; set; } public string FullPath { get; set; } } public class ImageInfo { public int Width { get; set; } public int Height { get; set; } public byte ColorDepth { get; set; } public int CompressionQuality { get; set; } public string ImageMagicInfoString { get; set; } // <-- 3500 bytes when updateing }

When I try to update ImageFileInfo in DB I get this exception : Additional information: Index key must be less than 512 bytes

I did not create any additional indexes and this error does not occur if ImageMagicInfoString is empty or smaller than 250 bytes.

mbdavid commented 8 years ago

Hi @dsoronda, LiteDB auto create index when you run query in a non indexed field. This could be happend to you. Try using LiteDB.Shell to check using db.your-collection.indexes. Try drop this index and insert again.

If you need search this field again, try use Linq using collection.Find(..<indexed_columns>>..).Where(x => x.ImageMagicInfoString = "123")

dsoronda commented 8 years ago

I didn't run query on that field. I updated ImageFileInfo.ImageInfo with new data (before that ImageInfo was null). So basically what i did was something like : using ( var db = new LiteDatabase( this.ConnectionString ) ) { var col = db.GetCollection( collectionName ); var entity = col.Find( x => x.ImageInfo == null , pager: new Pager { Take = 1 } ); entity.ImageInfo = new ImageInfo () { Width = 100, Height = 100, ImageMagicInfoString = new string ('x', 4000)}; repo.update(entity); /// <- this throws exception : Index key must be less than 512 bytes }

So, for now, I can't put any string larger than 512 bytes in LiteDB because it is auto indexed ? How to mark this field as ignore /don't create index since i really don't need index on that field ?

mbdavid commented 8 years ago

Yes, you can use fields (string/byte[]) with more than 512 bytes, no problems. You can create index in this fields (or you can create but when you insert new value with more than 512 will can cause excpetion).

Did you check if there no index in this field with shell? Or you can create a unit test to reproduce this problem?

dsoronda commented 8 years ago

// this is simple console application public class SiteRip { public int Id { get; set; } public string Url { get; set; } public string Html { get; set; } // }

internal class Program {
    private static void Main( string[] args ) {
        var dbFileName = @"d:\data\testfile.litedb";
        var entity = new SiteRip( ) {Url = @"https://github.com/mbdavid/LiteDB"};
        using ( var db = new LiteDatabase( dbFileName ) ) {
            var col = db.GetCollection<SiteRip>( "siteRip" );
            int result = col.Insert( entity );
        }

        SiteRip updateEntity = null;
        using ( var db = new LiteDatabase( dbFileName ) ) {
            var col = db.GetCollection<SiteRip>( "siteRip" );
            var indexes = col.GetIndexes( ).ToList( );
            updateEntity = col.Find( x => x.Html == null ).Single( );
            var indexes_after = col.GetIndexes( ).ToList( ); // got index on HTML
            updateEntity.Html = new string( 'a', 4000 );
            col.Update( updateEntity ); // EXCEPTION
        }
    }
}

So, I have to drop index before updating Html field to make this example to work. I really don't need Index on that filed ( can I flag it as DontCreateIndex or something with mapping) ?

mbdavid commented 8 years ago

Here's the problem:

updateEntity = col.Find( x => x.Html == null ).Single( );

When you run this, LiteDB needs an index to know where Html == null, so, create an index. To use Find(....) you always need an index. To run this without use internal index system, you need run like this: col.FindAll().Where(x => x.Html == null).Single(). Now, LiteDB do a full scan in all documents, deserialize and returns as IEnumerable. Than, Linq to Object and checks if this object has value in Html.

A long time ago, I implement a version that, when there is no index, litedb implement a full scan internal (v.1.0.0). Take look here:

https://github.com/mbdavid/LiteDB/commit/e2b3df2066ed22492cf9268235a3cf7c42fc5734

But I removed after and implement auto-index to run query fast.

dsoronda commented 8 years ago

For me this is a real drawback :(

What if I have millions of records ? I have to pull them into memory just to get ie. 10 records ? I really need something like FindAll (x => x.Html == null, skip : 0, take: 100).ToList();

I don't care if there is no index on that field, LiteDb still can run full scan on disk (i have SSD :) and I will get proper results.

I understand that speed is good, but I also need to say when to use indexes and when not to.

mbdavid commented 8 years ago

With IEnumerable you will not pull all 1 million documents in memory, you put one document in memory, checks if Html is null, destroy and go to another. There is no other solution without index. I remove my old implementation because Linq to Object do same thing (and he did much better, with support a more complex queries, line x.Html.Length > x.Title.Legth). The same happends with OrderBy solution.

LiteDB is a local database, so it's not the same as DMBS did in a remote server when all operation must be executed in server to avoid transfer all data to client. In LiteDB there is only "server".

If you want only check if your string field is null you can do:

public bool HtmlIsNull { get { return this.Html == null } }` and query (using index) in this field.

dsoronda commented 8 years ago

You are right. This is how I implemented it in my repository : class Repo .....{ string ConnectionString = "..."; string collectionName = "...";

    public IList<T> FindAll( Func<T, bool> findFilter ,  int skip, int take ) {
        using ( var db = new LiteDatabase( this.ConnectionString ) ) {
            var col = db.GetCollection<T>( this.collectionName );
            // no index created :)
            var result = col.FindAll().Where( findFilter ).Skip( skip ).Take( take ).ToList();
            return result;
        }
    }

.... You could put it in examples or something.

Thanks for quick reply.