xErik / lemmatizerx

MIT License
6 stars 0 forks source link

Does lemmatizerx use Network? #1

Closed jsrimr closed 1 year ago

jsrimr commented 1 year ago

When I try to split the sentence into lemmas, the process gets really slow.

I doubt I may got blocked from the server due to too many network requests. But, I'm not sure lemmatizerx really uses network or not. Could you verify it?

Thanks,

xErik commented 1 year ago

Hello!

It does not use the network, it is a standalone package.

Repeated calls should not slow it down.

Can you give me some example code, please? Then I will try to replicate the problem.

jsrimr commented 1 year ago

Thanks for your reply! I attach code snippets

  Future<List<dynamic>> loadWords() async {
    // return lemmatized words, and their translations
    var words = sentence.content.split(' ')
      ..removeWhere((element) => element.length <= 3)
      ..removeWhere((element) => element.contains(RegExp(r'[0-9]')))
      ..removeWhere((element) =>
          element.contains(RegExp(r'[!@#<>?":_`~;[\]\\|=+)(*&^%0-9-]')))
      ..removeWhere((element) => element.contains(RegExp(r'[.,]')));

    var results = <dynamic>[];
    for (var i = 0; i < words.length; i++) {
      var word = words[i];
      // var translation = await getTranslation(word);
      // var lemmatizedWord = lemmatizer.lemmasOnly(word)[0];
      // results.add([lemmatizedWord, translation]);
      results.add([word, '']);
    }
    return results;
  }

and I use the result with below code.

 @override
  Widget build(BuildContext context) {
    return FutureBuilder<List<dynamic>>(
        future: loadWords(),
        builder: (context, snapshot) {
          if (snapshot.hasData) {
            return Expanded(
              child: ListView.builder(
                  itemCount: snapshot.data!.length,
                  itemBuilder: (context, index) {
                    return Card(
                        color: Colors.white,
                        elevation: 6,
                        margin: const EdgeInsets.all(10),
                        shadowColor: Colors.purple,
                        child: ListTile(
                          title: Text(snapshot.data![index][0].toString()),
                          // subtitle: Text(snapshot.data![index][1].toString()),
                          trailing: InkWell(
                            onTap: () {
                              addVocab(snapshot.data![index][0].toString(), snapshot.data![index][1].toString());
                            },
                            child: const Text('Save'),
                          ),
                        ));
                  }),
            );
          } else {
            return const Center(child: CircularProgressIndicator());
          }
        });
  }

Thanks!

xErik commented 1 year ago

The negative performance impact is probably caused by the usage of FutureBuilder.

Search for "futurebuilder called many times" for a more clear explanation of the FutureBuilder issue.

Please refer to the untested demo code below and let me know if this approach helps,

import 'package:flutter/material.dart';

class Test extends StatefulWidget {
  @override
  TestState createState() => TestState();
}

class TestState extends State<Test> {
  late final Future<List<List<String>>> myFuture;

  @override
  void initState() {
    myFuture = _loadWords();
    super.initState();
  }

  Future<List<List<String>>> _loadWords() async {
    // Split everything that is NOT withing range of a-zA-Z
    // including German umlauts. This RegEx should work, please test it.

    var words = sentence.content.split(RegExp(r'[^a-zA-ZüäöÜÄÖ]')) 
      ..removeWhere((word) => word.length <= 3);

    List<List<String>> results = [];

    for (var i = 0; i < words.length; i++) {
      final word = words[i];
      // final translation = await getTranslation(word);
      // final lemmatizedWord = lemmatizer.lemmasOnly(word)[0];
      // results.add([lemmatizedWord, translation]);
      results.add([word, '']);
    }
    return results;
  }

  @override
  Widget build(BuildContext context) {
    return FutureBuilder<List<List<String>>>(
        future: myFuture, // THIS FUTURE HOLDS DATA, IT IS NOT A METHOD CALL
        builder: (context, snapshot) {
          if (snapshot.hasData) {
            final data = snapshot.data!;

            return Expanded(
              child: ListView.builder(
                  itemCount: data.length,
                  itemBuilder: (context, index) {
                    final lemma = data[index][0];
                    final trans = data[index][1];

                    return Card(
                        color: Colors.white,
                        elevation: 6,
                        margin: const EdgeInsets.all(10),
                        shadowColor: Colors.purple,
                        child: ListTile(
                          title: Text(lemma),
                          subtitle: Text(trans),
                          trailing: InkWell(
                            onTap: () {
                              addVocab(lemma, trans);
                            },
                            child: const Text('Save'),
                          ),
                        ));
                  }),
            );
          } else {
            return const Center(child: CircularProgressIndicator());
          }
        });
  }
}
jsrimr commented 1 year ago

Well, I'm using FutureBuilder inside StatelessWidget, So I don't think this is not "futurebuilder called many times" issue.

I've also tried changing the widget to StatefulWidget and use instance of Future rather than _loadWords() , but things don't seem to change 😥

jsrimr commented 1 year ago

I copied your snippet, and things seem to work..!

However, it wasn't 😭.

      final word = words[i];
      // final translation = await getTranslation(word);
      // final lemmatizedWord = lemmatizer.lemmasOnly(word)[0];
      // results.add([lemmatizedWord, translation]);
      results.add([word, '']);

The code is not using lemmatizer... which means things didn't get fixed.

My beg uploading unproperly commented code. I was experimenting the performance between w/ lemmatizer and w/o.

xErik commented 1 year ago

Please provide complete code that allows me to replicate the problem.

Also, make sure await getTranslation(word); is not responsible for the delay.

This code takes ~700 milliseconds running flutter test:


import 'package:flutter_test/flutter_test.dart';
import 'package:lemmatizerx/lemmatizerx.dart';

final lemmatizer = Lemmatizer();
const text =
    '''Ever dreamed of owning your own secret citadel in a beautiful region of Italy, wandering along its fortified walls like a monarch surveying their kingdom?
For less than the price of a townhouse in central London or an attic apartment in Rome's historic center -- that dream can now come true.
The medieval castle and hamlet of Serravalle, half-way between the cities of Modena and Bologna in Italy's northern Emilia Romagna region, is up for sale for 1.9 million euros, or about 2 million.
Already livable and fitted with heating, it requires just minimal fixes -- and the price is negotiable. 
''';

void main() {
  test('Test', () {
    final d0 = DateTime.now();

    var words = text.split(RegExp(r'[^a-zA-ZüäöÜÄÖ]+'))
      ..removeWhere((word) => word.length <= 3);
    List<List<String>> results = [];

    for (var i = 0; i < words.length; i++) {
      final word = words[i];
      // final translation = await getTranslation(word);
      final lemmas = lemmatizer.lemmasOnly(word);
      if (lemmas.isNotEmpty) {
        final lemmatizedWord = lemmas.elementAt(0);
        results.add([lemmatizedWord, '']);
      }
    }

    final d1 = DateTime.now();
    final diff = d1.difference(d0);

    print('${diff.inMilliseconds} milliseconds');

    print(results);
  });
}
`