named-data / python-ndn

An NDN client library with AsyncIO support in Python 3
https://python-ndn.readthedocs.io/en/latest
Apache License 2.0
24 stars 17 forks source link

Low Performance when Preparing Data #33

Closed justincpresley closed 8 months ago

justincpresley commented 2 years ago

OS: Ubuntu 20.04 Forwarder: NFD Problem: When using python-ndn to form data packets from 8000-sized byte chunks (streamed from a FTP server), I have found that app.prepare_data() is really expensive or has poor performance.

When testing it with 100MBs, to just stream the file and write the bytes to disk, It takes 1.99 seconds to complete. Doing that same process but just preparing each chunk using app.prepare_data() (doing nothing with the chunk for comparison reasons), It takes 36.75 seconds to complete. In addition, this scales with the data size which is logical as I am calling prepare_data() more. Is this expected? Is there something in particular that makes prepare_data take awhile? Any alternatives to prepare_data?

I tested this using the script below. If you run it yourself, you will have to setup a file server using vsftpd and create a guest user which has /files/mbstest at the home directory.

from ftplib import FTP
import sys
import time
import logging
from ndn.app import NDNApp
from ndn.types import InterestNack, InterestTimeout, InterestCanceled, ValidationFailure
from ndn.encoding import Name, Component, InterestParam, MetaInfo, ContentType

def ftp_download(app, translation):
    ftp = FTP(translation["host"], translation["username"] if translation["host"] != "null" else "anonymous", translation["password"] if translation["password"] != "null" else "")

    size = ftp.size(translation["filename"])
    if not size:
        return False
    final_segment_num = (size//8000)-1 if (size%8000==0) else (size//8000)
    final_id = Component.from_segment(final_segment_num)
    mi = MetaInfo(freshness_period=1000, final_block_id=final_id)
    logging.info(f'Size: {size}, Final_segment: {final_segment_num}, Meta: {mi}')

    packet_number = 0
    int_name = Name.from_str("/GUEST/FILES/MBSTEST")
    info = open('file_name', 'wb')

    def handle_ftp_binary(byte_chunk):
        nonlocal packet_number, mi
        info.write(byte_chunk)
        #data_packet = app.prepare_data(int_name + [Component.from_number(packet_number, Component.TYPE_SEGMENT)], byte_chunk, meta_info=mi)
        packet_number = packet_number + 1

    logging.info(f'Streaming the File Now')
    resp = ftp.retrbinary("RETR "+translation["filename"], callback=handle_ftp_binary, blocksize=8000)
    logging.info(f'Streaming Complete')
    info.close()
    ftp.quit()
    app.shutdown()

def main(app) -> int:
    logging.basicConfig(format='[%(asctime)s]%(levelname)s:%(message)s',
                        datefmt='%Y-%m-%d %H:%M:%S',
                        level=logging.INFO)
    start = time.time()
    translation = {}
    translation["host"] = "localhost"
    translation["username"] = "guest"
    translation["password"] = "welcomehere"
    translation["filename"] = "/files/mbstest"
    ftp_download(app, translation)
    print(f'Total time: {time.time() - start} seconds')

if __name__ == "__main__":
    app = NDNApp()
    try:
        app.run_forever(after_start=main(app))
    except FileNotFoundError:
        logging.warning(f'Error: could not connect to NFD')
zjkmxy commented 2 years ago

Hello. Due to time conflict I have to delay it to next week. Things that may be slow are:

zjkmxy commented 8 months ago

Hello. After some investigation I think the most time consuming part is signing. Preparing data packets with SHA256 hash only takes about 2s but with an ECC key will take 40s.

Here is the calling graph showing the hot path:

callingGraph

Since there is nothing I can do for this specific performance issue (prepare_data), I will close the issue for now.